Interface circuits for modularized data optimization engines and methods therefor

ABSTRACT

A data optimization engine for optimizing selected frames of a first stream of data. The data optimization engine includes a transmit interface circuit coupled to an optimization processor, the transmit interface circuit being configured for receiving the first stream of data. The transmit interface circuit includes a traffic controller circuit for separating frames in the first stream of data into a first optimizable frame and a first non-optimizable frame, and an optimization front-end circuit coupled to the traffic controller circuit to receive at least a first portion of the first optimizable frame. The optimization front-end circuit includes a protocol conversion circuit configured to convert data in the first portion of the first optimizable frame from a first protocol to a second protocol suitable for processing by the optimization processor, the first protocol specifies a first word length, the second protocol specifies a second word length different from the first word length. The optimization front-end circuit further includes an end-of-optimization-file processing circuit, the end-of-optimization-file processing circuit flagging an end of the first portion of the first optimizable frame to the optimization processor, wherein the optimization processor is configured to optimize the first portion of the first optimizable frame by performing at least one of compression and encryption on the first portion of the first optimizable frame.

BACKGROUND OF THE INVENTION

The present invention relates to a data optimization engine foroptimizing data transmission bandwidth and storage capacity inelectronic systems and computer networks. More particularly, the presentinvention relates to highly modular data optimization engines, which aredesigned to be reconfigurable in an efficient and simplified manner towork with different protocols, and methods therefor.

Data transmission links and data storage devices are basic buildingblocks of modern electronic systems and computer networks. Datatransmission links are present in every electronic system and are alsofundamental for interconnecting nodes in a computer network. In anelectronic system, such as in a computer for example, a datatransmission link such as an address bus or a data bus may be employedto transmit digital data between two or more subsystems. Within acomputer network (e.g., a local area network, a metro area network, awide area network, or the Internet), data may be transmitted from onenetworked device to another via one or more data transmission linksusing a variety of well-known networking protocols. As is well known,the data transmission links themselves may be implemented using anyphysical media, such as wireless, copper or fiber optics, and maytransfer data in a serial or parallel format.

In modern high-speed electronic systems, the data transmission link haslong been regarded as one of the bottlenecks that limit overall systemperformance. To facilitate discussion of the foregoing, FIG. 1 showssimplified CPU, bus, and memory subsystems within an exemplary computer100. In a typical computer system, such as in computer 100, a centralprocessing unit (CPU) 102 typically operates at a much higher speed thanthe speed of a bus 104, which is employed to transmit data between CPU102 and the various subsystems (such as a memory subsystem 106). By wayof example, in some Windows™-based or Unix-based computer systems, it isnot unusual to see a CPU having a clock speed in the Gigahertz rangebeing coupled to a data bus running in the low hundreds of Megahertzrange. There are many reasons behind the disparity between the CPU speedand the bus clock speed. For one, advances in processor technologiestend to follow the so-called Moore's law, which states that the speed ofa typical electronic device can be expected to double roughly every 18months. The clock speed of a typical data or address bus, on the otherhand, is limited by the impedance and other physical characteristics ofconductive traces that comprise the bus. Thus, it is often timesimpractical to run these buses at a higher speed to match the speed ofthe fast CPU due to issues related to power, interference, and the like.

The data storage device, such as a memory subsystem 106 within computersystem 100, also represents another bottleneck to higher overall systemperformance. With regard to memory subsystem 106, there are generallythree issues: 1) the speed of data transfer to and from memory subsystem106, 2) the operating speed of memory subsystem 106, and 3) the storagecapacity of memory subsystem 106. With regard to the data transfer speedissue, the discussion above regarding the data transmission linkbottleneck applies. With regard to the operating speed of memorysubsystem 106, dynamic random access memory (DRAM), which is widelyemployed for storage of data and instructions during operation, must berefreshed periodically (by a memory controller 108 as shown or by sometype of refresh circuitry), and the capacitors employed in the DRAM tostore the charges representing the 0's and 1's have a finite responsetime. Together, these factors tend to limit the speed of a typical DRAMto well below the operating speed of the CPU. Even if static randomaccess memory (SRAM) is employed (assuming the high power consumptionand low density issues can be tolerated) in memory subsystem 106, theoperating speed of a typical SRAM is also well below that of a typicalCPU in computer system 100.

Because of the relative slow response of memory subsystem 106, attemptshave been made, some more successful than others, to improve memoryaccess speed. Caching is one popular technique to improve the memoryaccess speed for frequently used or most recently used data. In caching,a small amount of dedicated very high-speed memory 110 is interposedbetween memory subsystem 106 and CPU 102. This high-speed memory is thenemployed to temporarily store frequently accessed or most recently useddata. When there is a memory read request from the CPU, the cache memoryis first checked to see whether it can supply the requested data. Ifthere is a cache hit (i.e., the requested data is found in the cachememory), the faster cache memory, instead of the slower main memory,supplies the requested data at the higher cache memory access speed.

Caching, however, increases the overall complexity of the computersystem architecture and its operating system. Further, the use ofexpensive and power-hungry cache memory (e.g., on-board high speedcustom SRAM) disadvantageously increases cost, power consumption, andthe like. Furthermore, the cache hit rate is somewhat dependent on thesoftware application and other parameters. If the cache hit rate is low,there may not be a significant improvement in memory access speed tojustify the added complexity and cost of a cache subsystem.

As mentioned above, the memory capacity in memory subsystem 106 alsorepresents another constraint to higher overall system performance.Modern complex software, which is often employed to manipulate largedatabase, graphics, sound, or video files, requires a large amount ofmain memory space for optimum performance. The performance of manycomputer systems can be greatly improved if more storage is provided inthe computer system's main memory. Due to power consumption, board spaceusage, and cost concerns, however, most computer systems are howevermanufactured and sold today with a less-than-optimum amount of physicalmemory on board. Consequently, the overall system performance suffers.

The same three issues pertaining to main memory 106 (i.e., the speed ofdata transfer to and from memory, the operating speed of the memory, andthe storage capacity) also apply to a permanent memory subsystem (suchas a hard disk). When a hard disk drive is employed for storing data,for example, the limited speed of the data transmission link between thehard disk drive and the main system bus, the slow access time due to themechanical rotation nature of the hard disk's platters and themechanical movement of the actuator arm that contains the read/writehead, as well as the fixed storage capacity of the platters allrepresent factors that tend to limit system performance. Yet, with theadvent of the Internet and improved multimedia technologies, usersnowadays routinely transmit and store large graphics, video, and soundfiles using the permanent memory subsystem in their computers.Consequently, it is generally desirable to increase both the memoryaccess speed and the storage capacity of the permanent memory subsystem.

The same three issues pertaining to main memory 106 (i.e., the speed ofdata transfer to and from memory, the operating speed of the memory, andthe storage capacity) also apply to Network-Assisted Storage (NAS)systems, storage area networks (SANs), RAID storage systems, and othernetworked electromagnetic or optical-based data storage systems. Withreference to FIG. 2, irrespective of the protocol implemented on atransmission link 202 between a drive controller 204 and the actualstorage media 206 (e.g., hard disks, optical platters, and the like),storage performance can be improved if the effective data throughputthrough transmission link 202 can be improved. This is true irrespectivewhether the protocol implemented is serial ATA (S-ATA), IDE, FCAL, SCSI,Fiber Channel over Ethernet, SCSI over Ethernet, or any other protocolemployed to transfer data between disk controller 204 and storage media206. With respect to the storage capacity issue, there is a fixedcapacity to storage media 206 based on physical limitations and/orformatting limitations. From a cost-effectiveness standpoint, it wouldbe desirable to transparently increase the capacity of storage media 306without requiring a greater number and/or larger platters, or changingto some exotic storage media.

The data transmission bandwidth bottleneck also exists within modernhigh-speed computer networks, which are widely employed for carryingdata among networked devices, whether across a room or across acontinent. In a modern high-speed computer network, the bottlenecks may,for example, reside with the transmission media (e.g., the wirelessmedium, the copper wire, or the optical fiber) due to the physicalcharacteristics of the media and the transmission technology employed.Further, the bottleneck may also reside with the network switches, hubs,routers, and/or add-drop multiplexers which relay data from one networknode to another. In these devices, the line cards and/or switch fabricare configured to operate at a fixed speed, which is typically limitedby the speed of the constituent devices comprising the line card. Thedevice speed is in turn dictated the latest advances in microelectronicsand/or laser manufacturing capabilities. In some cases, the bottleneckmay be with the protocol employed to transmit the data among the variousnetworked devices. Accordingly, even if the transmission media itself(such as a fiber optic) may theoretically be capable of carrying agreater amount of data, the hardware, software, and transmissionprotocols may impose a hard limit on the amount of data carried betweentwo nodes in a computer network.

To further discuss the foregoing, there are shown in FIG. 3, in asimplified format, various subsystems of a typical Ethernet-basednetwork 300. Components of Ethernet-based network 300 are well known andreadily recognized by those skilled in the art. In general, digital datafrom a Media Access Controller (MAC) 302 is transformed into physicalelectrical or optical signals by a transceiver 304 to be transmitted outonto a Ethernet network 308 via a data transmission link 306, which isan Ethernet link in this case. MAC 302, as well as transceiver 304,generally operate at a predefined speed, which is dictated in part bythe Ethernet protocol involved (e.g., 10 Mbps, 100 Mbps, 1 Gbps, or 10Gbps). Thus, the throughput of data through the Ethernet arrangement 300of FIG. 3 tends to have a finite limit, which cannot be exceededirrespective of capacity requirement or the theoretical maximum capacityof data transmission link 306.

As the network grows and the capacity requirement for Ethernet-basednetwork 300 increases, it is customary to upgrade MAC 302 andtransceiver 304 and other associated electronics to enable datatransmission link 306 to carry more data. With the advent of theInternet, however, a 300% growth in data traffic per year is not unusualfor many networks. A hardware upgrade to one of the higher speedprotocols, unfortunately, tends to involve network-wide disruptivechanges (since the sending and receiving network nodes must be upgradedto operate at the same speed). A system-wide upgrade is also costly asmany network nodes and components must be upgraded simultaneously tophysically handle the higher speed protocol. It would be desirable tohave the ability to enable Ethernet 300 to effectively carry more datafor a given transmission speed. It would also be desirable to have theability to upgrade, in a scalable manner, selective portions of thenetwork so that both the upgraded and the legacy equipment caninteroperate in an automatic, transparent manner.

In a commonly-owned, co-pending patent application entitled DataOptimization Engines And Methods Therefor (filed by inventor IsaacAchler on the same date, and incorporated by reference herein), variousimplementations of a data optimization engine and methods therefor aredescribed in detail. In particular, various implementations of anoptimization processor which are capable of performing at least one orboth of the compression/decompression and encryption/decryption tasksare described in detail. Since the optimization processor and dataoptimization engine described in the above-discussed patent applicationhave utility in many different environments, such as in computer systemsand computer networks to transparently optimize the data transmissionbandwidth, in storage systems (e.g., hard disks, RAID systems, NetworkAssistant Storage or NAS systems, Storage Area Networks or SANs, andother networked electromagnetic or optical-based data storage systems)to optimize the data transmission bandwidth and storage capacity, it isrealized that it would be highly advantageous to create a universal,modular data optimization engine that can be easily and efficientlyadapted to work with different protocols.

Generically speaking, for a data optimization engine to optimize astream of data having a given protocol, certain issues need to beaddressed in addition to the actual compression/decompression and/orencryption/decryption tasks themselves. To allow the data optimizationengine to be universal, protocol adaptation, i.e., the translation ofthe data from the protocol received to one that can be understood by theoptimization processor, needs to be performed. After the data isoptimized by the optimization processor, the optimized data needs toundergo protocol adaptation again prior to outputting.

Data alignment and data parsing are also protocol-specific tasks thatneed to be handled differently for different data input protocols. Dataalignment refers to the need to recognize and frame the incoming dataproperly with respect to some reference data frame as the incoming datais received. Data alignment facilitates data parsing, since efficientdata parsing relies on the correct relative positioning of the variousdata fields within some reference data frame. For each data frame thatcan be optimized (since not all data frames are eligible foroptimization), some portion of the optimizable data frame needs to bepreserved while other portions can be optimized by the optimizationprocessor. Data parsing separates the optimizable portion from thenon-optimizable portion of the data frame so that the optimizableportion can be optimized by the optimization processor.

A related task is optimizable data handling, which refers to the need toreassemble the data frame, putting together the non-optimizable portionof the data frame with the optimizable portion after the optimizationprocessor has finished its optimizing task. Optimizable data handlingensures that a properly reassembled data frame is presented at theoutput for transmission to the next hop or to the final destination. Asmentioned, some incoming data frames may be non-optimizable, e.g., dueto an explicit request from software or from some other higher layer inthe communication stack. Bypass data handling needs to be performed onthe incoming data to ensure that the data optimization engine willhandle these non-optimizable data frames properly.

Another task is congestion control, which is necessary to ensure thatthe optimization processor is not overloaded if incoming data isreceived at the data optimization engine in rapid bursts. Congestioncontrol gives the optimization processor time to complete itsoptimization task on a frame-by-frame basis while minimizing and/oreliminating the possibility of dropping incoming data frames if theyarrive in rapid bursts. Yet another related task is traffic handling,which ensures that while data optimization takes place within the inlinedata optimization engine, the communication channel remains error-free.Traffic handling is necessary if the data optimization engine is to betransparent to the transmitting and receiving devices.

Since these tasks all need to be performed, and they are all differentfor different protocols, the challenge of creating a universal dataoptimization engine rests, in part, in the ability to innovativelysection and modularize the data optimization engine and to innovativelyarrange the various circuits therein in a manner such that when the dataoptimization engine needs to be reconfigured to work with a differentprotocol, the reconfiguration may be done quickly and efficiently andchanges to the data optimization engine may be minimized.

In view of the foregoing, there are desired improved techniques andapparatus for optimizing the data transmission bandwidth in data busesand network transmission links, as well as for optimizing the storagecapacity of temporary and permanent memory in electronic devices andcomputer networks.

SUMMARY OF THE INVENTION

The invention relates generally to a highly modularized,protocol-flexible data optimization engine for performing high speed,adaptive, in-line optimization (compression/decompression and/orencryption/decryption) of data using either hardware or software. Thedata optimization engine includes a transmit interface circuit that isprotocol-flexible, a high speed optimization processor, and receiveinterface circuit that is also highly flexible with regard to theprotocol on the transmission medium. The data optimization engine alsoimplements, in one embodiment, a novel high speed adaptive compressiontechnique that improves on the standard LZW compression.

The invention relates, in one embodiment, to a data optimization enginefor optimizing selected frames of a first stream of data. The dataoptimization engine includes a transmit interface circuit coupled to anoptimization processor, the transmit interface circuit being configuredfor receiving the first stream of data. The transmit interface circuitincludes a traffic controller circuit for separating frames in the firststream of data into a first optimizable frame and a firstnon-optimizable frame, and an optimization front-end circuit coupled tothe traffic controller circuit to receive at least a first portion ofthe first optimizable frame. The optimization front-end circuit includesa protocol conversion circuit configured to convert data in the firstportion of the first optimizable frame from a first protocol to a secondprotocol suitable for processing by the optimization processor, thefirst protocol specifies a first word length, the second protocolspecifies a second word length different from the first word length. Theoptimization front-end circuit further includes anend-of-optimization-file processing circuit, theend-of-optimization-file processing circuit flagging an end of the firstportion of the first optimizable frame to the optimization processor,wherein the optimization processor is configured to optimize the firstportion of the first optimizable frame by performing at least one ofcompression and encryption on the first portion of the first optimizableframe.

These and other features of the present invention will be described inmore detail below in the detailed description of the invention and inconjunction with the following figures.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and not by wayof limitation, in the figures of the accompanying drawings and in whichlike reference numerals refer to similar elements and in which:

FIG. 1 shows simplified CPU, bus, and memory subsystems within anexemplary computer to facilitate discussion of the transmissionbandwidth bottleneck issue therein.

FIG. 2 is a simplified illustration of a portion of a data storagesystem to facilitate discussion of the transmission bandwidth bottleneckand capacity bottleneck issues therein.

FIG. 3 illustrates, in a simplified format, various subsystems of atypical Ethernet-based network to discuss the bandwidth bottleneck issuetherein.

FIG. 4 shows, in accordance with one embodiment of the presentinvention, a high level block diagram of the inventive data optimizationengine.

FIG. 5 shows, in accordance with one embodiment of the presentinvention, how a data optimization engine may be deployed in a FiberChannel setting.

FIG. 6 depicts, in accordance with one aspect of the present invention,how a data optimization engine may be employed to improve theperformance of a data storage system.

FIG. 7 depicts, in accordance with one aspect of the present invention,how a data optimization engine may be employed to improve performance ina computer system when a CPU accesses its main memory.

FIGS. 8 and 9 depict how a data optimization engine may be employed in acommunication network.

FIG. 10 shows, in accordance with one embodiment of the presentinvention, an arrangement whereby the inventive data optimization engineis interposed between two PCI devices in an extended PCI (PCI-X) system.

FIG. 11 shows, in a logic diagram format, the logic functions of a dataoptimization engine in accordance with one embodiment of the presentinvention

FIG. 12 is a flowchart describing the inventive HSO compressiontechnique in accordance with one aspect of the present invention.

FIG. 13 is a flowchart describing the inventive HSO decompressiontechnique in accordance with one aspect of the present invention.

FIG. 14 shows, in accordance with one embodiment of the presentinvention, another high-level block diagram of the data optimizationengine.

FIG. 15 illustrates a typical Fiber Channel data frame.

FIG. 16 illustrates the structure of an Idle word, representing a typeof primitive signal word in the Fiber Channel protocol.

FIG. 17 shows, in accordance with one embodiment of the presentinvention, a transmit interface circuit in greater detail.

FIG. 18 illustrates, in accordance with one embodiment of the presentinvention, a flowchart showing how the traffic controller circuit mayprocess each 40-bit word received from the frame alignment circuit.

FIG. 19 illustrates, in accordance with one embodiment of the presentinvention, how the end-of-optimized-data-flag-handler circuit handlesoptimized data received from the optimization processor.

FIG. 20 illustrates, in accordance with one embodiment, how the protocolconversion circuit may perform the protocol conversion such that outputwords having the correct polarities may be output to bus framingcircuit.

FIG. 21 shows, in accordance with one embodiment of the presentinvention, a receive interface circuit in greater detail.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention will now be described in detail with reference toa few preferred embodiments thereof as illustrated in the accompanyingdrawings. In the following description, numerous specific details areset forth in order to provide a thorough understanding of the presentinvention. It will be apparent, however, to one skilled in the art, thatthe present invention may be practiced without some or all of thesespecific details. In other instances, well known process steps and/orstructures have not been described in detail in order to notunnecessarily obscure the present invention.

FIG. 4 shows, in accordance with one embodiment of the presentinvention, a high level block diagram of the inventive data optimizationengine 400. Referring now to FIG. 4, the inventive data optimizationengine includes three main logic blocks in each of the transmit andreceive data paths. In the transmit data path, data input at a bus 402is received by a protocol recognition engine 404. Protocol recognitionengine 404, which is tailored to one or more specific protocols, servesto extract the payload from the input data, which is formatted inaccordance with the dictates of the protocol employed. By way ofexample, data input at bus 402 may conform to the Peripheral ComponentInterconnect (PCI) interface, PCI-X interface (an extension of the PCIinterface to enable higher speed), Infiniband (a high speed competingprotocol to PCI), High Speed Serial Interface (HSSI), 10-bit interface(TBI, such as that developed under the guidance of the X3 technicalcommittee of the American National Standards Institute), serial ATA(Serial AT attachment, an interface for coupling with storage devices),or the 64/66 protocol (which may be seen as either a derivative of the10-bit protocol or an extension of the PCI protocol). Protocolrecognition engine may also perform some or all of other tasks such astraffic handling, congestion control, data alignment, data parsing,optimizable data handling, and the like. These tasks are discussed ingreater detail in connection with FIG. 17 herein.

FIG. 4 also shows block 440, representing the processing block that maybe provisioned within protocol recognition engine 404 to handle higherlayer or overlay protocols such as, for example, Ethernet (1/10/40Gigabit), Fiber Channel (1/2/10 Gigabit), Extended Attachment UnitInterface (XAUI), or I-SCSI (a storage over Ethernet interface).

The payload extracted by protocol recognition engine 404 is thentransmitted to a transmit payload processor 406 via a bus 408. In oneembodiment, protocol recognition engine 404 also performs congestionmanagement. That is, protocol recognition engine 404 manages the flow ofdata into transmit payload processor 406 to ensure that transmit payloadprocessor is not overloaded. Additionally, protocol recognition engine404 may also perform some level of bypass traffic management, such asdetecting certain data frames or words that do not need to and/or shouldnot be compressed and/or encrypted based on the information provided inthe header. These data frames or words are then permitted to bypasstransmit payload processor to proceed immediately to the output port.

At transmit payload processor 406, compression and/or encryption may beperformed. Whether transmit payload processor 406 performs compressionand/or encryption on a particular data block received from protocolrecognition engine 404 depends on many factors, which will be discussedlater herein. After compression and/or encryption, transmit payloadprocessor 406 outputs the processed payload data onto a bus 412 to betransmitted to a protocol restoration engine 410. Since transmit payloadprocessor 406 deals primarily with the payload portion of the datareceived on bus 402, it is necessary to make the processed payload datatransmitted from transmit payload processor 406 conform to theappropriate protocol for eventual transmission to another device. Thusprotocol restoration engine 410 performs the appropriate processing andpackaging on the processed payload data to render the processed payloaddata conformant to the protocol expected by the downstream devicereceiving such a device coupled to media 414 (which can be optical,wired, or wireless media).

In accordance with one advantageous embodiment, the protocol restorationengine 410 may in fact package the optimized payload data received fromthe transmit payload processor in a protocol different from the protocolassociated with that of bus 402. For example, the data may employ theFiber Channel protocol on bus 402 but may be packaged by protocolrestoration engine 410 to be transmitted out on bus 414 using theGigabit Ethernet protocol. In fact, any of the aforementioned protocolsor a well-known protocol may be received and data optimization engine400 may perform protocol translation in addition to or in place ofoptimization so that a different protocol, which may be any of theaforementioned protocols or another well-known protocol, may be sentout. Together, protocol recognition engine 404 and protocol restorationengine 410 may be thought of as the interface circuitry for transmitpayload processor 406.

On the receive path, the protocol recognition engine 420 receives datafrom media 418 and performs payload extraction (and/or congestionmanagement and/or bypass traffic management) and other tasks similar tothose performed by protocol recognition engine 404 associated with thetransmit path. The payload extracted is then transmitted to a receivepayload processor 422 via a bus 416. Receive payload processor 422 thendecrypts and/or decompresses the payload as necessary. Whether receivepayload processor 422 performs decryption and/or decompression on aparticular data block received from protocol recognition engine 420depends on many factors, which will be discussed later herein. Afterdecryption and/or decompression, receive payload processor 422 outputsthe processed payload data onto a bus 424 to be transmitted to aprotocol restoration engine 426. Since receive payload processor 422deals primarily with the payload portion of the data received on media418, it is necessary to make the processed payload data transmitted fromreceive payload processor 422 conform to the appropriate protocol foreventual transmission to another device. Thus protocol restorationengine 426 performs the appropriate processing and packaging on theprocessed payload data to render the processed payload data conformantto the protocol expected by the device receiving such data from media430 (which can be optical, wired, or wireless media). Again, protocoltranslation may occur on the receive path as well.

To provide an example of how the data optimization engine of FIG. 4 maybe employed, FIG. 5 shows, in accordance with one embodiment of thepresent invention, how a data optimization engine 502 may be deployed ina Fiber Channel setting. In FIG. 5, the data optimization engine 502 isinterposed between a Fiber Channel controller 504 and a SERDES(Serializer/Deserializer) 506 via 10-bit interface 508 and 510respectively. This 10-bit interface implements the 10-bit encodingscheme to transmit information on the Fiber Channel link. Furtherinformation regarding the 10-bit encoding may be found in the text“Fibre Channel: A comprehensive Introduction” by Robert W. Kembel(Northwest Learning Associates, Inc., Tucson, Ariz., 2000), incorporatedby reference herein. Fiber Channel controller 504 may, for example, bepart of an I/O plug-in board or an integral part of a computer system.

Data received at Fiber Channel controller 504 is compressed and/orencrypted as appropriate in real time by data optimization engine 502prior to being output to SERDES 506 for transmission over media 520.Data received from media 520 is decrypted and/or decompressed asappropriate by data optimization engine 502 prior to being output toFiber Channel controller 504. It should be noted that although the FiberChannel protocol is employed in the example of FIG. 5, other protocolssuch as some of those mentioned (e.g., Ethernet, Infiniband, XAUI) maywell be implemented.

The data optimization engine may find use in many diverse applicationswhere there is a need to increase the bandwidth of the transmissionlink, the memory/storage access speed and capacity, and/or a need forthe ability to implement compression/encryption in a manner so as toguarantee compatibility with other devices irrespective whether thoseother devices implement the data optimization engine.

FIG. 6 depicts, in accordance with one aspect of the present invention,how a data optimization engine may be employed to improve theperformance of a data storage system. In FIG. 6, there is shown a hostdevice 602, which transmits data to and receives data from a storagedevice 604 using a suitable protocol. By way of example, FIG. 6 showsfour exemplary interfaces 606, 608, 610, and 612, representingalternative interfaces for permitting host 602 to communicate withstorage device 604 using the fiber channel protocol, the Ethernetprotocol, the SCSI protocol, or the Infiniband protocol respectively.

The data optimization engine may be disposed at location 614, either asa separate device or integrated directly with host device 602. Formanufacturers of processors or motherboards, this arrangement is usefulto transparently improve I/O performance vis-à-vis storage device 604.Alternatively, the data optimization engine may be disposed at locations616 and 618 to facilitate communication via the Fiber Channel or theEthernet protocols. This arrangement is useful for peripheral devicemanufacturers, who may want to incorporate the advanced compression andencryption capabilities of the inventive data optimization enginewithout requiring changes in either host device 602 or storage device604 (which may be manufactured by other parties). Alternatively, thedata optimization engine may be integrated with storage device 604(shown by reference number 630), thereby allowing storage device 604 tostore more information and responds to memory requests in less timewithout requiring changes in either host device 602 or interfaces606–612. Note that in general, only one data optimization engine isrequired (i.e., only one of data optimization engines 614, 616, or 618is required) between host device 602 and storage device 604.

FIG. 7 depicts, in accordance with one aspect of the present invention,how a data optimization engine may be employed to improve performance ina computer system when a CPU accesses its main memory. In the case ofFIG. 7, since data communication between a CPU 702 and a memory 704occurs within a closed system, encryption is generally unnecessary.However, the encryption capability of the data optimization engine maybe employed if encryption is deemed desirable (e.g., in highly securesystems or when the communication takes place over a networked link).With respect to FIG. 7, CPU 702, memory 704, cache 705, and memorycontroller 706 are conventional and generally communicate amongthemselves using a bus-based protocol or a high speed serial protocol.The data optimization engine may be disposed at a location 708, which isgenerally considered part of the CPU subsystem or even integrated withinthe die of the processor itself. This arrangement is highly advantageousfor processor manufacturers looking for a competitive advantage since itpermits the CPU to transparently and apparently improve the rate of datatransfer between itself and memory 704, as well as to transparently andapparently increase the capacity of memory 704 as well as to implementencryption without taking up a significant amount of CPU processingresources, all without requiring changes in memory controller 706 ormemory 704.

The data optimization engine may be disposed at location 710, i.e.,between CPU 702 and memory controller 706. In one preferred embodiment,the data optimization engine may be made part of the memory controllersubsystem or integrated with one of the memory controller ICs. Thisarrangement is advantageous for memory controller manufacturers who wishto offer the ability to apparently increase the speed of data transferbetween CPU 702 and memory 704 without requiring changes in memory 704,CPU 702, or cache 705. In the background, the data optimization enginecompresses (and/or encrypts) the data before passing the processed dataonward. The fact that the data is optimized means that fewer bits needto be transmitted between CPU 702 and memory 704. This increases, in anapparent manner, the transmission speed/bandwidth of the bus between CPU702 and memory 704. Furthermore, fewer bits need to be stored in memory704, which means that fewer memory cycles are needed to store/access therequired data. This in turn also increases the speed, in an apparentmanner, of memory access by CPU 702 for any given file. It should bepointed out that the apparent speed increase and bandwidth increase dueto the fact that few bits need to be transmitted also apply in both thedata storage system setting (e.g., FIG. 6) and in the networkingsetting.

The data optimization engine may be disposed at a location 712, i.e., aspart of memory 704. This arrangement is advantageous for memorymanufacturers, such as DRAM or RAM manufacturers or hard disk or opticaldrive manufacturers, to apparently increase the speed of data transferbetween CPU 702 and memory 704 as well as to increase the apparentcapacity of the physical memory without requiring changes in memorycontroller 706, CPU 702, or cache 705. In the background, the dataoptimization engine compresses the data before storing on the physicalmedia to reduce the number of bits that need to be stored. Since thebottleneck to higher performance in permanent memory subsystems tends tobe found in the relatively slow mechanical movement of the access arm(as in the case of hard disk drives) or the speed at which the bits canbe recorded onto storage locations in the media (e.g., the speed atwhich the magnetic particles can be aligned to store information or thespeed at which the optical media records information, or the speed atwhich the latches or capacitors may be able to store or read a bit ofdata), reducing the number of bits that need to be stored tend toincrease the overall performance of memory 704 as well as apparentlyincrease its capacity to store information.

FIGS. 8 and 9 illustrate, in accordance with embodiments of the presentinvention, how a data optimization engine may be employed totransparently and apparently increase the data transmission speed andbandwidth (i.e., carrying capacity) between networked devices (such asnetwork interface cards, routers, or switches). In FIG. 8, a dataoptimization engine may be provided with each networked device innetwork 802. In this case, the payload data is compressed and/orencrypted for transmission prior to being transmitted on a network linkin order to maximize the speed and bandwidth of the link, as well as toensure data security (if encryption is performed). Thus, the payloaddata is compressed and/or encrypted by network interface card (NIC) 804prior to being transmitted via link 806 to switch 808. At switch 808,the destination is looked up to ascertain the appropriate output port.If the destination device does not have the data optimization engine,the payload data may be decrypted and/or decompressed in switch 808.Thereafter, the data is transmitted out via link 810 to a router 812. Atrouter 812, the destination is looked up to ascertain the appropriateoutput port. If the destination device does not have the dataoptimization engine, the payload data may be decrypted and/ordecompressed in router 812 (unless decryption and/or decompressionoccurred already in switch 808). Thereafter, the data is againtransmitted out via a link 814 to a NIC 816. At NIC 816, the data isdecrypted and/or decompressed for use by a data optimization engineprovisioned therein. If NIC 816 does not have a data optimizationengine, the decryption and/or decompression occurs at one of the earliernodes as discussed.

Since the data optimization engine of the present invention cantransparently work with legacy networked devices, a NIC 822 or a switch824 which does not have the data optimization engine built-in can alsoutilize switch 808 and router 812 to transmit data to and receive datafrom NIC 804 and 816. If the data received at switch 808 or router 812is uncompressed and/or not encrypted, the inventive data optimizationengine can perform encryption and/or compression, effectively upgradingthe legacy networked devices up to the level of the upgraded network.Furthermore, if unencrypted/uncompressed date arrives at a NIC havingtherein the inventive data optimization engine, the data optimizationengine therein simply does not perform decryption and/or decompressionbefore passing the data on to its host. This is an advantage since itallows network 802 to be upgraded in a modular, gradual manner. In otherwords, one part of the network may be upgraded and be expected to workwith other parts of the network, which contain legacy devices. Thisability minimizes disruption to the network during upgrade cycles andgives network managers great flexibility in the provisioning of theirnetworks.

FIG. 9 depicts a network 852 wherein switch 858 and router 862 are bothlegacy network devices without the data optimization capability. NICs854 and 866 are, however, equipped with the inventive data optimizationengine. The situation of FIG. 9 is often realized, for example, when twocomputers equipped with NICs having integrated therein the inventivedata optimization engines communicate with one another via a publicnetwork. In this case, the ability to reduce the amount of data thatneeds to be transmitted (via compression) still yields advantages sincesuch optimization apparently improves the speed of data transfer betweenNICs 854 and 866 (since fewer bits need to be transmitted for a givenamount of information) and the carrying capacity of links 856, 860, and864. Encryption increases the security of the data transmitted, which isalso an important consideration when data is transmitted/received overcomputer networks.

Note that when only one of NIC 854 or NIC 866 is equipped with the dataoptimization capability and the other is not, data transmission is stillpossible. In this case, the switch or router device equipped with thedata optimization capability simply receives the uncompressed (and/ornonencrypted) data and passes such data transparently through the dataoptimization engine. Prior to retransmission of the data on the outputport of that switch or router, the payload data may be compressed and/orencrypted to transparently improve the transmission speed or networkcapacity or data security. In one embodiment, however, a field may beemployed in the header portion of the received data that informs switch858 or router 862 that the payload data should not be compressed and/orencrypted (as in the case wherein the receiving NIC does not have theability to decrypt and/or decompress).

In yet another embodiment, the networked devices at the edge of thenetwork (e.g., the Label Edge Routers or LER in a MPLS network) are allequipped with data optimization engines to permit the all datatransferred among nodes of the network to be compressed and/or encryptedirrespective whether the sending and/or receiving NICs have the abilityto encrypt/decrypt (and/or compress/decompress). Thus, the payload datais compressed and/or encrypted once at the input edge of the network anddecrypted and/or decompressed again at the output edge of the network.In between, the payload data is in its compressed and/or encrypted formto yield the bandwidth/speed-enhancing advantages and/or the securityadvantages.

In yet another embodiment, only the routers or switches at the edge ofthe network for a given data flow perform the compression/decompressionand/or encryption/decryption even though the network nodes in betweenmay also be provisioned with the inventive data optimization engines(which can perform the compression/decompression and/orencryption/decryption for other data flows). In this case, the dataframes or blocks may be marked with a flag (e.g., in the header) so asto insure that compression/decompression and/or encryption/decryptioncycle only takes place once through the network. This is an advantage inheterogeneous networks (such as the Internet) where no single entity maycontrol the various end-to-end paths through which various data flowsare expected to traverse.

Irrespective of the specific implementation, the inventive dataoptimization engine allows network providers to apparently increase thespeed of data transmission among the nodes of the network, as well asapparently increase the capacity of the network links, as well asincrease the data security among the network nodes without requiring anupgrade to all the NICs and/or all network nodes to those capable ofcompression/decompression and/or encryption/decryption.

FIG. 10 shows, in accordance with one embodiment of the presentinvention, an arrangement whereby the inventive data optimization engineis interposed between two PCI devices 1002 and 1004 in an extended PCI(PCI-X) system. In a PCI-based system, a PCI device may either be a PCImaster or a PCI target, depending on the type of communication thattakes place between itself and one or more other PCI devices.

For discussion purposes, there are two broad types of transaction thatPCI device 1002 may wish to initiate vis-à-vis PCI device 1004. PCIdevice 1002 may write configuration data to PCI device 1004 via the CW(configuration write) transaction 1006A. In this case, data, address,signaling, and other types of information pertaining to configurationwould be sent from PCI device 1002 and received and/or acknowledged byPCI device 1004. Likewise, PCI device 1002 may receive configurationinformation from PCI device 1004 via the CR (configuration read)transaction 1008A. Again, in this CR transaction, data, address,signaling, and other types of information pertaining to configurationwould be sent from PCI device 1004 and received and/or acknowledged byPCI device 1002. Configuration read transactions may be initiated byeither PCI device 1002 or PCI device 1004 to enable PCI device 1002 toreceive configuration data.

Memory Write (MW) transaction 1010 and Memory Read (MR) transaction 1012are two other types of transaction between PCI device 1002 and PCIdevice 1004. In MW transaction 1010, PCI device 1002 writes one or moreblocks of data to PCI device 1004 at certain address locations. Inaddition to clocking and signaling data, both the address and data arespecified. In MR transaction 1012, PCI device 1002 requests one or moreblocks of data from PCI device 1004. Again, in addition to clocking andsignaling data, both the address and data are specified.

As shown in FIG. 10, a data optimization engine 1020 is interposedinline between PCI device 1002 and PCI device 1004 and monitors thetransactions between these two devices. Configuration transactions arepassed through data optimization engine 1020 substantially transparentlywithout significant processing. In FIG. 10, these CW transaction 1006Aand CR transaction 1008A are shown passing substantially transparentlythrough data optimization engine 1002 as are CW transaction 1006B and CRtransaction 1008B.

Memory write transactions MW 1010, on the other hand, are examined byoptimization processor 1030 for possible encryption and/or compression.If encryption and/or compression are appropriate for this data, the datato be written to PCI device 1004 is encrypted and/or compressed (shownby reference number 1040) prior to being transmitted to PCI device 1004.

Conversely, memory read transactions 1012 are also examined byoptimization processor 1030 for possible decryption and/ordecompression. If decryption and/or decompression are appropriate (shownby reference number 1042), the data to be written from PCI device 1004to PCI device 1002 is decrypted and/or decompressed prior to beingtransmitted to PCI device 1002.

Within the optimization processor, there are two engines: a compressionengine and a decompression engine. In one embodiment, at the output sidethe compression engine, there is provided a packer in order to receivethe compression output, which comes from the compression engine fromtime to time, and packs those compression output as a continuous streamin groups of n, with n being the number of bits required by theinterface circuitry. Thus, the packer is flexible with regard to thenumber of bits of data that it packs into. For example, the 3-bit codeoutput is received by the packer from time to time as output by thecompression engine, and is packed by the packer into groups of two,assuming 2 is the number of bits required by the interface circuitry.

At the input side of the decompression engine, there is provided acorresponding unpacker, which receives from the packer associated withthe compressor continuous streams of data in groups of n, with n beingthe number of bits employed by the interface circuitry. In this case,the unpacker then unpacks this stream of bits into the compressed codehaving a size corresponding to the size of the compressor output code.In the previous example, the unpacker would receive a stream ofcompressed data in groups of two and unpacks this stream into 3-bitcodes to be fed to the decompressor.

If the packing results in a partial group, then padding may be needed.For example, if the compression output code is 11 bits and the interfacecircuitry requires 8 bits, the receipt of 3 compression output codes is33 bits of data. Packing 33 bits of data into groups of 8 will result ina partial group. In one embodiment, padding is performed so that thenumber of bits, including the pad, is a multiple of n (or a multiple of8 in this example). Thus, another 7 bits will be padded. In anotherembodiment, this is solved by padding the 33 bits up to a group sizethat is equal to the size of the compression output code multiplied bythe size of the group output by the packer. In this example, this groupsize is 88 bits (or 11 bits×8 bits). In other words, 55 bits are padded.The unpacker then looks at each 88-bit group that comes in, and in any88-bit group that contains the EOF, the padding that comes after the EOFis ignored.

FIG. 11 shows, in a logic diagram format, the logic functions of dataoptimization engine 1020 in accordance with one embodiment of thepresent invention. In block 1102, the method first decides whether thetransaction under consideration is a control transaction or a datatransfer. If the transaction under consideration is a transaction otherthan a data transfer, the method proceeds to block 1104 to pass thetransaction substantially transparently through the data optimizationengine. On the other hand, if the transaction is a data transfertransaction, the method proceeds to block 1106. One skilled in the artshould readily appreciate that the discussion also applies to othertypes of data transfer transactions, such as data transmission inside acomputer system or between a computer and its storage device(s).

In block 1106, it is ascertained whether the data transfer transactionunder consideration is a transmit transaction or a receive transaction.In general, receive data appears on the receive data input port;transmit data appears on the transmit data input port. If a transmittransaction is detected, the method proceeds to block 1108 to ascertainwhether the data therein is compressible. In one embodiment, the headercan be analyzed to see if the data is already compressed, or if the datais of a type that cannot be compressed. This may be indicated via one ormore fields in the header. By way of example, the Fiber Channel headertypically has one or more fields to indicate such information.Alternatively or additionally, this information may be provided byhigher level software in a pre-determined field. If the examinedtransmit data contains non-compressible data, compression is notperformed and the data is immediately passed to block 1110 to ascertainwhether encryption should be performed.

In block 1110, the decision whether to encrypt may be based on whetheran encryption key is detected. In most public key transcription schemes,a key is typically present if encryption is desired. Of course thereexist other ways to detect whether encryption is desired, depending onthe encryption scheme employed (such as, e.g., flagging in anappropriate field in the header of the data frames). If encryption isdesired (as ascertained in block 1110), the method proceeds to block1120 to encrypt. After encryption, the encrypted data is passed to block1124 to transfer out. On the other hand, if encryption is not desired(as ascertained in block 1110), the method bypasses the encryption block1120 and proceeds directly to block 1124 to transfer the data out.

If the transmit transaction under consideration contains compressibledata (as ascertained in block 1108), the method proceeds to block 1122to perform compression. Thereafter, the compressed data is passed ontoblock 1110 to decide whether encryption should also be performed. Ifencryption is desired (as ascertained in block 1110), the methodproceeds to block 1120 to encrypt. In general, any encryption techniquemay be employed. In one embodiment, encryption is performed using118-bit AES public key encryption. However, since the inventive dataoptimization engine performs compression prior to encryption, even lowerpowered encryption schemes (e.g., 64-bit public key) may be employedwith a high degree of confidence since the combination of compressionand subsequent encryption renders the encrypted data much more difficultto break than encryption alone. This is an advantage since it is notpossible in some markets, due to governmental restrictions or otheradministrative restrictions, to employ a high-powered encryption scheme.

Thereafter, the encrypted data is passed to block 1124 to transfer out.On the other hand, if encryption is not desired (as ascertained in block1110, the method bypasses the encryption block 1120 and proceedsdirectly to block 1124 to transfer the data out.

As can be seen, when data transmit transaction is under consideration,the data optimization engine, in real time, decides whether to compress.Irrespective whether compression is performed, another independentdecision is made whether to encrypt. There is thus a great deal offlexibility with regard to how data may be treated prior to being sentonward to the receiving device/interface.

On the other hand, if the data transfer transaction is a receivetransaction (as ascertained in block 1106), the method proceeds to block1130 to ascertain whether the data received was encrypted earlier. Inone embodiment, the information pertaining to whether a data frame in adata block was encrypted may be stored using a bit in the header of thedata frame (e.g., a SONET or Ethernet header). Alternatively oradditionally, the information pertaining to whether a data frame or adata block (which comprises multiple data frames) was encrypted may alsobe stored in a table/database associated with a data storage deviceduring a data write transaction to that data storage device. Thistable/database is then consulted during a data retrieval transaction todetermine whether encryption was involved. In yet another embodiment,encryption is ascertained by detecting whether a key is present with thedata frame or data block associated with the memory read transaction(assuming a public key encryption scheme).

If the data associated with the receive transaction is encrypted data,the method proceeds to block 1132 to decrypt the data block received.Thereafter, the method proceeds to block 1134 to ascertain whether thedata was compressed earlier. On the other hand, if the data associatedwith the receive transaction is non-encrypted data, the method bypassesblock 1132 and proceeds directly to block 1134 (which ascertains whetherthe data associated with the memory transaction was compressed).

In one embodiment, each data frame in the block is marked with a bitthat flags whether that data frame contains compressed data. By way ofexample, this bit may be in the header of the data block itself (such asthe Ethernet or SONET header). In another embodiment, the informationpertaining to whether a data block contains compressed data is stored ina table or database in the memory storage device (e.g., hard drive).During a transmit transaction, the table is updated if the data blockstored contains compressed data. Responsive to the data retrievalrequest, the table/database is then consulted to ascertain whether therequested data was compressed earlier.

If the data block was compressed (as ascertained by block 1134), themethod proceeds to block 1136 to decompress the data block. Afterdecompression, the method proceeds to block 1138, representing the I/Oblock to output the data to the device that requested it. On the otherhand, if the data block was not compressed earlier, the method bypassesblock 1136 to proceed directly to block 1138 to output the data to thedevice that requested it.

As can be seen, when a data receive is under consideration, the dataoptimization engine, in real time, decides whether the data wascompressed earlier and to decompress if needed. Uncompressed datatransparently bypasses the decompression logic of the inventive dataoptimization engine. Irrespective whether decompression is performed,another independent decision is made whether to decrypt. In this manner,the inventive data optimization engine is highly flexible and fullycompatible with other subsystems/devices that do not have compressionand/or encryption capabilities in that data from those devices maybypass the decompression/decryption logic of the data optimizationengine. This flexibility permits the data optimization engine to beemployed to upgrade a computer network in a modular, gradual fashionsince the flexibility in working with both compressed and uncompresseddata, as well as with encrypted and un-encrypted data, permits thenetwork devices that implement the inventive data optimization engine tointeroperate smoothly with other legacy and upgraded network devices.This flexibility also permits the data optimization engine to beemployed to upgrade a computer system or a data storage system in amanner so as to minimize the number of changes required in the varioussubsystems of the computer system or the data storage system, since theflexibility in working with both compressed and uncompressed data, aswell as with encrypted and un-encrypted data, permits the subsystemsthat implement the inventive data optimization engine to interoperatesmoothly with other legacy or off-the-shelf subsystems of the computersystem or data storage system.

In accordance with another aspect of the present invention, there isprovided an inventive High Speed Optimized (HSO)compression/decompression technique to enable the data optimizationengine to perform high speed, in-line adaptive loss-lesscompression/decompression. To facilitate discussion of the inventive HSOcompression/decompression technique, some background discussion on LZWcompression may be in order first.

LZW compression is the compression of a file into a smaller file using atable-based lookup algorithm invented by Abraham Lempel, Jacob Ziv, andTerry Welch. A particular LZW compression algorithm takes each inputsequence of bits of a given length (for example, 12 bits) and creates anentry in a table (sometimes called a “dictionary” or “codebook”) forthat particular bit pattern, consisting of the pattern itself and ashorter code. As input is read, any pattern that has been read beforeresults in the substitution of the shorter code, effectively compressingthe total amount of input to something smaller. The LZW algorithm doesinclude the look-up table of codes as part of the compressed file.However, one particularly useful feature of LZWcompression/decompression is that it is capable of building the table(i.e., dictionary or codebook) on the fly during decompression. That is,the decoding program that uncompresses the file is able to build thetable itself by using the algorithm as it processes the input compresseddata. An explanation of the LZW algorithm may be found in Mark Nelson's“LZW Data Compression” from the October, 1989 issue of Dr. Dobb'sJournal (2800 Campus Drive, San Mateo, Calif. www.ddi.com). Furtherdetails regarding LZW compression may be found, for example, in thearticle “A Technique for High Performance Data Compression,” Terry A.Welch, IEEE Computer, 17(6), June 1984, pp. 8–19 (all of the abovearticles are incorporated herein by reference).

Further, LZW compression is highly adaptable to any type of input data.It is this adaptability of LZW that renders it highly useful as astarting point for the compression engine employed in the presentinvention. Many other data compression procedures require priorknowledge, or the statistics, of the data being compressed. Because LZWdoes not require prior knowledge of the data statistics, it may beutilized over a wide range of information types, which is typically therequirement in a general purpose data optimization engine.

LZW, as an algorithm for compression, is known in the art. An example ofknown LZW compression/decompression in operation is discussed below.Suppose the input string /WED/WE/WEE/WEB needs to be compressed usingLZW.

TABLE 1 Example of standard LZW compression. New code value andCharacter input Code output associated string /W / 256 = /W E W 257 = WED E 258 = ED / D 259 = D/ WE 256 260 = /WE / E 261 = E/ WEE 260 262 =/WEE /W 261 263 = E/W EB 257 264 = WEB <END> B

In this example, LZW starts with a 4K dictionary, of which entries 0–255refer to individual bytes, and entries 256–4095 refer to substrings.This type of dictionary is useful for text compression, for example.Each time a new code is generated it means a new string has been parsed.New strings are generated by appending the current character K to theend of an existing string w.

The algorithm for LZW compression is as follows:

set w = NIL loop  read a character K  if wK exists in the dictionary   w= wK  else   output the code for w   add wK to the string table   w = Kend loop

A sample run of LZW over a (highly redundant) input string can be seenin the Table 1 above. The strings are built up character-by-characterstarting with a code value of 256. LZW decompression takes the stream ofcodes and uses it to exactly recreate the original input data. Just likethe compression algorithm, the decompressor adds a new string to thedictionary each time it reads in a new code. All it needs to do inaddition is to translate each incoming code into a string and send it tothe output.

A sample run of the LZW decompressor is shown in below in Table 2. Usingthe compressed code /WED<256>E<260><261><257>B as input to decompressor,decompression yields the same string as the input to the compressorabove.

TABLE 2 Example of standard LZW decompression. New code value and Inputcode Output string associated string / / W W 256 = /W E E 257 = WE D D258 = ED 256 /W 259 = D/ E E 260 = /WE 260 /WE 261 = E/ 261 E/ 262 =/WEE 257 WE 263 = E/W B B 264 = WEB

As can be seen, one remarkable feature of LZW compression is that theentire dictionary has been transmitted to the decompressor withoutactually explicitly transmitting the dictionary. At the end of the run,the decompressor will have a dictionary identical to the one the encoderhas, built up entirely as part of the decoding process.

The above discussion relates to the known LZW compression algorithm. Tooptimize the compression for use in the data optimization engine of thepresent invention, several improvements are added. In one embodiment, tominimize the size of the dictionary and the time spent looking up thedictionary, the invention limits the number of different output codes toa fixed number. In other words, whereas the standard LZW compressionalgorithm assumes that there would be a sufficient number of outputcodes to represent each unique bit pattern in the dictionary, theinvention in one embodiment is optimized to guarantee correctcompression and decompression even if there are far fewer output codevalues than the number of unique bit patterns requiring storage in thedictionary.

One disadvantage with storing one unique compression output code witheach unique bit pattern in the dictionary is that for a universal dataoptimization engine, it is often not known in advance what type of datawould be encountered, how compressible the input data would be, and thushow many unique bit patterns may be encountered. In such as case, theknown LZW algorithm would require one to overprovision the dictionary,i.e., to allot a sufficient large number of code values and asufficiently large amount of storage space so as to ensure that there isa unique code for each unique bit pattern to be stored into the tablefor all types of data that may be encountered.

However, the challenge with limiting the number of output codes and thesize of the dictionary is that there exists a risk that the number ofunique bit patterns encountered would exceed the number of output codesprovided. When the number of unique bit patterns that need to be storedin the dictionary exceeds the number of output codes in the dictionary,known LZW compression techniques break down, as far as the inventor isaware. Yet, limiting the number of output codes and the size of thedictionary is often the key to keeping the memory size to a reasonablenumber and the dictionary search time low to enable real-time operationand/or to make a universal data optimization engine.

In accordance with one aspect of the present invention, there isprovided an adaptive High Speed Optimized (HSO) compression techniquethat addresses the need for a high speed, low memory usage, adaptivecompression technique, and which can be implemented in hardware for highspeed, in-line operation, or in software for portability. The inventiveHSO compression technique in accordance with one embodiment of thepresent invention may be better understood with reference to Table 3 andFIG. 12 herein.

TABLE 3 Example of inventive HSO compression. Bit Pattern Input toSearch Counter CAM CAM Code Row character (3 bits + Value contentaddress Output No. (2 bits) 2 bits) (3 bits) (3 bits) (5 bits) (3 bits)Note R1 1a R2 1b 1al b 4 4 11 1a R3 1c 1b1c R4 1d 41d 5 5 41 4 R5 1e1d1e R6 1f 41f R7 1g 51g 6 6 51 5 R8 1h 1g1h R9 1i 41i R10 1j 51j R11 1k61k 7 → 4 4 61 6 The CAM address 11, which previously stored the value 4(Row R2) is now freed up R12 1L 1k1L 5 1k Special Case R13 1m 1L1m 6 1LSpecial Case R14 1n 1m1n 7 → 4 1m Special Case R15 1o 1n1o 5 1n SpecialCase R16 1p 1o1p 6 6 11 1o The CAM address 51, which previously storedthe value 6 (Row R7) is now freed up R17 0q 1p0q 7 → 4 4 10 1p The CAMaddress 61, which previously stored the value 4 (Row RH) is now freed upR18 1r 0q1r 5 5 01 0q R19 1s 1r1s R20 1t 61t 6 6 61 6 The CAM address11, which previously stored the value 6 (Row R16) is now freed up R21 1u1t1u 7 → 4 1t Special Case R22 1v 1u1v 5 5 11 1u R23 EOF 1v 1v R24 EOF

In the example of Table 3, the input pattern is as shown in rows R1–R22,with row R23 having the special input EOF, which marks the end of theinput file. Each input “character” is assumed to be two bits (and thuscan have the value of 0, 1, 2, or 3). Since 3 is the maximum value ofthe input character, the value 4 is selected to be the first countervalue representing the smallest code output from the dictionary. Itshould be apparent that any value higher than 3 can be chosen as thesmallest code output from the dictionary, albeit with some loss ofefficiency since a larger number of bits will be required to represent alarger output code value.

To illustrate the ability of the present inventive technique to compressand decompress with only a limited number of output code values to savememory, the number of bits of the output code value will be artificiallyconstrained to be 3 and the maximum value to 6. The value 7 (the largestvalue that can be produced using 3 bits) is used, in this example, torepresent the EOF flag in the output stream to be sent to thedecompressor. Thus, there are only 3 additional output code values(i.e., 4, 5, and 6), other than the input characters and the EOF flag,that will be in the compressed output stream. As can be appreciated byone skilled in the art, this allows the content addressable memorydictionary (or a dictionary implemented in any other type of memorytechnology) to be vastly reduced in size and also substantiallysimplifies the process of searching through the dictionary for amatching bit pattern.

In the example of Table 3, the input sequence is1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,EOF. To simplifyunderstanding, the input characters are given subscripts (a–v) to aidthe reader in tracking the explanation through the table. It is to beunderstood that these subscripts (a–v) are present only to aid thereader in understanding the example of Table 3; these subscripts are notpresent in the stream of data. Also, two temporary string variablesCharIn1 and CharIn2 are employed in FIG. 12 to track the input andoutput values when stepping through the compression technique.

With reference to Table 3, in row R1, the input character “1” isinputted (FIG. 12: 1202). For ease of reference to Table 3, thesubscript “a” is employed in the discussion. Thus, row R1 has “1a” asthe input character. Since this is the first input value, nothing iswritten to the dictionary or outputted.

In row R2, the input character is 1b (FIG. 12: 1206). The bit pattern isnow 1a1b (FIG. 12: 1208). It should be noted that prior to forming thebit pattern to search through the dictionary, the prior input character1a is padded (i.e., pre-pended with zero) to make it 3 bits to match thesize of the output code value (FIG. 12: 1204). Because of this padding,the resultant bit pattern is now uniformly 5 bits at all times, whichsimplifies the search and storage process. Since 11 is not in thedictionary (FIG. 12: 1210), the counter value is incremented to 4 (FIG.12: 1212/1214) and is written to the CAM at CAM address 11 (FIG. 12:1216). The output code is 1a (FIG. 12: 1218/1220/1204).

In row R3, the input character is 1c (FIG. 12: 1206). Since “1a” is justoutputted, the remaining bit pattern representative of 1b is padded(FIG. 12: 1204) and then merged with 1c to form the bit pattern 1b1c(FIG. 12: 1208). Since 11 is already in the dictionary (stored at rowR2), the dictionary is not updated and nothing is outputted. However,the content of CAM address location 11 (which is 4 as shown in row R2)is noted for use with the next input character (FIG. 12: 1210/1222).

In row R4, the input character is 1d (FIG. 12: 1206). Now the bitpattern is 41d (FIG. 12: 1208), which is a merging of the content of CAMaddress location 11 (which is 4 as shown in row R2), and the new inputcharacter 1d. Since the content of CAM address location 11 (which is 4as shown in row R2) is already 3 bits, no padding is needed, and the newbit pattern 41d is 5 bits as before, which simplifies searching andstorage. Since 41 is not in the dictionary (FIG. 12: 1210), the countervalue is increased to 5 (FIG. 12: 1212/1214) and is written to the CAMat CAM address 41 (FIG. 12: 1216). The output code is 4(FIG. 12:1218/1220/1204).

In row R5, the input character is 1e (FIG. 12: 1206). Now the bitpattern is 1d1e (FIG. 12: 1208), which is a merging of what remains (1d)of the previous bit pattern for searching (41d) after a code isoutputted (4). Note that since what remains comes from the inputcharacter 1d, the subscript “d” is again used for ease of understanding.Since 11 is already in the dictionary (stored at row R2), the dictionaryis not updated and nothing is outputted. However, the content of CAMaddress location 11 (which is 4 as shown in row R2) is noted for usewith the next input character (FIG. 12: 1210/1222).

In row R6, the input character is 1f (FIG. 12: 1206). Now the bitpattern is 41f (FIG. 12: 1208), which is a merging of the content of CAMaddress location 11 (which is 4 as shown in row R2), and the new inputcharacter 1f. Since 4l is already in the dictionary (stored at row R4),the dictionary is not updated and nothing is outputted. However, thecontent of CAM address location 41 (which is 5 as shown in row R4) isnoted for use with the next input character (FIG. 12: 1210/1222).

In row R7, the input character is 1g (FIG. 12: 1206). Now the bitpattern is 51g (FIG. 12: 1208), which is a merging of the content of CAMaddress location 41 (which is 5 as shown in row R4), and the new inputcharacter 1g. Since 51 is not in the dictionary, the counter value isincreased to 6 and is written to the CAM at CAM address 51 (FIG. 12:1210/1212/1214/1216). The output code is 5 (FIG. 12: 1218/1220/1204).

In row R8, the input character is 1h (FIG. 12: 1206). Now the bitpattern is 1g1h (FIG. 12: 1208), which is a merging of what remains (1g)of the previous bit pattern for searching (51 g) after a code isoutputted (5). Since 11 is already in the dictionary (stored at row R2),the dictionary is not updated and nothing is outputted. However, thecontent of CAM address location 11 (which is 4 as shown in row R2) isnoted for use with the next input character (FIG. 12: 1210/1222).

In row R9, the input character is 1i (FIG. 12: 1206). Now the bitpattern is 41i (FIG. 12: 1208), which is a merging of the content of CAMaddress location 11 (which is 4 as shown in row R2), and the new inputcharacter 1i. Since 41 is already in the dictionary (stored at row R4),the dictionary is not updated and nothing is outputted. However, thecontent of CAM address location 41 (which is 5 as shown in row R4) isnoted for use with the next input character (FIG. 12: 1210/1222).

In row R10, the input character is 1j (FIG. 12: 1206). Now the bitpattern is 51j (FIG. 12: 1208), which is a merging of the content of CAMaddress location 41 (which is 5 as shown in row R4), and the new inputcharacter 1j. Since 51 is already in the dictionary (stored at row R7),the dictionary is not updated and nothing is outputted. However, thecontent of CAM address location 51 (which is 6 as shown in row R7) isnoted for use with the next input character (FIG. 12: 1210/1222).

In row R11, the input character is 1k (FIG. 12: 1206). Now the bitpattern is 61k (FIG. 12: 1208), which is a merging of the content of CAMaddress location 51 (which is 6 as shown in row R7), and the new inputcharacter 1k. Since 61 is not in the dictionary, the counter ordinarilywould be incremented and that value (7 in this case) stored into thedictionary. However, for the purpose of illustrating this embodiment ofthe invention, the counter value was arbitrarily constrained at 6 as themaximum value. Thus, the counter overflows (FIG. 12: 1212/1224) andreturns to 4, as shown in row 11 (FIG. 12: 1218/1220/1204).

Also in row R11, the value 4 was noted to have been associated with CAMaddress location 11 earlier (see row R2) (FIG. 12: 1226/1216). In oneadvantageous embodiment, a small shadow memory, which is employed tostore associative pairings between a CAM content value and itsassociated CAM address, is searched to determine which CAM address wasused previously to store the value 4 (FIG. 12: 1228). That is, theshadow memory addresses are the counter values, and the content storedat each address in the shadow memory is the CAM address currently usedto store the counter value that forms the shadow memory address. The useof a shadow memory advantageously allows the CAM address to be rapidlyascertained for any given counter value. This shadow memory is updatedevery time there is an update to the CAM. Once this CAM address location11 is ascertained, it is freed up in the CAM (FIG. 12: 1230). In otherwords, CAM address 11 is now considered free to store another value. Inone embodiment, each CAM address has associated with it a Free/Not Freeflag bit, and the flag bit is set whenever that CAM address is writtento and reset when that CAM address is freed. Alternatively oradditionally, the content of that CAM address may be reset to 0 when theCAM address is freed. Once CAM address location 11 is freed, the value 4is written into location 61 (FIG. 12: 1216), and the code value 6 isoutputted (FIG. 12: 1218/1220/1204).

In row R12, the input character is 1L (FIG. 12: 1206). The search bitpattern is now 1k1L (FIG. 12: 1208), which is a merging of what remains(1k) of the previous bit pattern for searching (61k) after a code isoutputted (6). However, this is a special case. At this point, anexplanation of the special case is in order. A special case exists whenthe current search bit pattern is the same as the search bit patternthat is associated with the next input character. Using an input bufferand a pipelined input structure in the input stage of the compressor,for example, it is possible to determine in advance the next inputcharacter and the search bit pattern that would be employed with thatnext input character. If one refers to the next row R13, it is possibleto see that the next input character is 1m, and the next search bitpattern would be 1L1m. When the special case is encountered, theinvention simply increment the counter (if such does not cause thecounter to overflow) and outputs the first part of the search bitpattern. Thus, the counter is incremented to 5 (FIG. 12: 1212/1214) andthe output code is 1k (FIG. 12: 1226/1218/1220/1204).

In row R13, the input character is 1m (FIG. 12: 1206). The search bitpattern is now 1L1m (FIG. 12: 1208), which is a merging of what remains(1L) of the previous bit pattern for searching (1k1L) after a code isoutputted (1k). However, this is a special case. If one refers to thenext row R14, it is possible to see that the next input character is 1n,and the next search bit pattern would be 1m1n. The special case, itshould be recalled, exists when the current search bit pattern is thesame as the search bit pattern that is associated with the next inputcharacter. When the special case is encountered, the invention simplyincrement the counter (if such does not cause the counter to overflow)and outputs the first part of the search bit pattern. Thus, the counteris incremented to 6 (FIG. 12: 1212/1214) and the output code is 1L (FIG.12: 1226/1218/1220/1204).

In row R14, the input character is 1n (FIG. 12: 1206). The search bitpattern is now 1m1n (FIG. 12: 1208), which is a merging of what remains(1m) of the previous bit pattern for searching (111m) after a code isoutputted (1L). However, this is a special case. If one refers to thenext row R15, it is possible to see that the next input character is 1o,and the next search bit pattern would be 1n1o. The special case, itshould be recalled, exists when the current search bit pattern is thesame as the search bit pattern that is associated with the next inputcharacter. When the special case is encountered, the invention simplyincrement the counter (if such does not cause the counter to overflow)and outputs the first part of the search bit pattern. However, theincrement of the counter causes it to overflow, and it is reset to 4(FIG. 12: 1212/1224), as shown in row R14. The output code is 1m (FIG.12: 1226/1218/1220/1204).

In row R15, the input character is 1o (FIG. 12: 1206). The search bitpattern is now 1n1o (FIG. 12: 1208), which is a merging of what remains(1n) of the previous bit pattern for searching (1m1n) after a code isoutputted (1m). However, this is a special case. If one refers to thenext row R16, it is possible to see that the next input character is 1p,and the next search bit pattern would be 1o1p. The special case, itshould be recalled, exists when the current search bit pattern is thesame as the search bit pattern that is associated with the next inputcharacter. When the special case is encountered, the invention simplyincrement the counter (if such does not cause the counter to overflow)and outputs the first part of the search bit pattern. Thus, the counteris incremented to 5 (FIG. 12: 1212/1214) and the output code is 1n (FIG.12: 1226/1218/1220/1204).

In row R16, the input character is 1p (FIG. 12: 1206). Now the bitpattern is 1o1p (FIG. 12: 1208), which is a merging of what remains (1o)of the previous bit pattern for searching (1n1o) after a code isoutputted (1n). This is not a special case since the ext input characterin row R17 is 0q, and the next search bit pattern is 1p0q, which isdifferent from the current search bit pattern 1o1p. Since CAM address 11is not used in the dictionary (it was freed up earlier in row R11), thecounter value incremented (FIG. 12: 1212/1214) and is written to the CAMat CAM address 11. The output code is 1o (FIG. 12: 1226/1218/1220/1204).

In row R17, the input character is 0q (FIG. 12: 1206). Now the bitpattern is 1p0q (FIG. 12: 1208), which is a merging of what remains (1p)of the previous bit pattern for searching (1o1p) after a code isoutputted (1o). Since 10 is not in the dictionary, the counterordinarily would be incremented and that value (7 in this case) storedinto the dictionary. However, for the purpose of illustrating thisembodiment of the invention, the counter value is arbitrarilyconstrained at 6. Thus, the counter overflows and returns to 4 (FIG. 12:1212/1224), as shown in row R17.

Also in row R17, the value 4 was noted to have been associated with CAMaddress location 61 earlier (see row R11) (FIG. 12: 1216). Once this CAMaddress location 61 is ascertained (FIG. 12: 1228), it is freed up inthe CAM (FIG. 12: 1230). In other words, CAM address 61 is nowconsidered free to store another value. Once CAM address location 61 isfreed, the value 4 is written into location 10 (FIG. 12: 1216), and thecode value 1p is outputted (FIG. 12: 1218/1220/1204).

In row R18, the input character is 1r (FIG. 12: 1206). Now the bitpattern is 0q1r (FIG. 12: 1208), which is a merging of what remains (0q)of the previous bit pattern for searching (1p0q) after a code isoutputted (1p). Since 10 is not in the dictionary (FIG. 12: 1216), thecounter value is increased to 5 (FIG. 12: 1212/1214) and is written tothe CAM at CAM address 01 (FIG. 12: 1216). The output code is 0q (FIG.12: 1218/1220/1204).

In row R19, the input character is 1s (FIG. 12: 1206). Now the bitpattern is 1r1s (FIG. 12: 1208), which is a merging of what remains (1r)of the previous bit pattern for searching (0q1r) after a code isoutputted (0q). Since 11 is already in the dictionary (stored at rowR16), the dictionary is not updated and nothing is outputted. However,the content of CAM address location 11 (which is 6 as shown in row R16)is noted for use with the next input character (FIG. 12: 1210/1222).

In row R20, the input character is 1t (FIG. 12: 1206). Now the bitpattern is 61t (FIG. 12: 1208), which is a merging of the content of CAMaddress location 11 (which is 6 as shown in row R16), and the new inputcharacter 1t. Since CAM address 61 is not used in the dictionary (it wasfreed up in row R17), the counter value is increased to 6 (FIG. 12:1210/1212/1214). In row R20, the value 6 was noted to have beenassociated with CAM address location 11 earlier (see row R16) (FIG. 12:1216). Once this CAM address location 11 is ascertained (FIG. 12: 1228),it is freed up in the CAM (FIG. 12: 1230). In other words, CAM address11 is now considered free to store another value. Once the CAM address11 is freed, the counter value is written to the CAM at CAM address 61(FIG. 12: 1216). The output code is 6 (FIG. 12: 1218/1220/1204).

In row R21, the input character is 1u (FIG. 12: 1206). Now the bitpattern is 1t1u (FIG. 12: 1208), which is a merging of what remains (1t)of the previous bit pattern for searching (61t) after a code isoutputted (6). However, this is a special case. If one refers to thenext row R22, it is possible to see that the next input character is 1v,and the next search bit pattern would be 1u1v. The special case, itshould be recalled, exists when the current search bit pattern is thesame as the search bit pattern that is associated with the next inputcharacter. When the special case is encountered, the invention simplyincrements the counter (if such does not cause the counter to overflow)and outputs the first part of the search bit pattern. However, theincrement of the counter causes it to overflow, and it is reset to 4(FIG. 12: 1212/1224), as shown in row R21. The output code is 1t (FIG.12: 1226/1218/1220/1204).

In row R22, the input character is 1v (FIG. 12: 1206). Now the bitpattern is 1u1v (FIG. 12: 1208), which is a merging of what remains (1u)of the previous bit pattern for searching (1t1u) after a code isoutputted (1t). Since 11 is not in the dictionary (the CAM addresslocation 11 was freed in row R20), the counter value is increased to 5(FIG. 12: 1212/1214) and is written to the CAM at CAM address 11 (FIG.12: 1216). The output code is 1u (FIG. 12: 1218/1220/1204).

In row R23, the special end of file character EOF (FIG. 12: 1206–1207)is encountered, and the compressor outputs the remaining character (FIG.12: 1240), which 1v (what remains of the previous search bit pattern1u1v after the code 1u is outputted). The compression process ends atblock 1242 of FIG. 12.

Note that the CAM only stores the counter value as its content, whichallows each row in the CAM table to be relatively small. This isadvantageous in helping to reduce the overall CAM size. The size of theCAM is further reduced by allowing the CAM address to be reused.Although a greater number of operations is required to search for CAMaddresses in the shadow memory, to update the shadow memory, and toreuse the CAM addresses, it is noted that the speed of logic circuitrynowadays typically outpaces the speed of memory devices. Thus, it isbelieved that the greater number of logic operations does not materiallyreduce the speed of the compression engine since the factors that limitcompression engine speed tends to be memory-related in the first place.

Table 4 shows a HSO decompression example for the bit pattern outputtedby the compressor discussed in connection with Table 3. In one preferredembodiment, the invention employs Random Access Memory instead of CAM tostore the dictionary. It should be noted, however, that although the useof RAM simplifies the implementation of the decompressor, it will beapparent to those skilled in the art that any memory technology may beemployed for the dictionary of the decompressor.

The counter is employed as the address value for storing and accessingthe bit patterns used to decompress the compressed data. Since thecounter value, with its relatively small value range, is employed foraddressing memory locations, the amount of memory required isadvantageously quite small. Thus, it is possible to implement thedictionary without resorting to a CAM. However, it should be recognizedthat the decompressor of the present invention is not limited to thedecompression technique disclosed herein (i.e., a standard LZW algorithmmay be employed instead).

TABLE 4 Example of invention's HSO decompression. Content New Old ZeroDecomp Counter/ (Old Code + Row Code Code Column Char Out Output AddressChar Out) No. (3 bits) (3 bits) (3 bits) (2 bits) (2 bits) (3 bits) (5bits) Group R1 la 0 0 1a 1a G1 R2 4 1a 1a 1a 1a G2 R3 1a 1a 0 1a 1a 41x1y G2 R4 5 4 4 1a 1x G3 R5 4 4 lx 1y 1y G3 R6 1x 4 0 1x 1a 5 41z G3 R76 5 5 1x 1x G4 R8 5 5 4 1z 1y G4 R9 4 5 1x 1y 1z G4 R10 1x 5 0 1x 1x 651x G4 R11 1c 6 0 1c 1c 4 61c G5 R12 1d lc 0 1d 1d 5 1c1d G6 R13 1e ld 01e 1e 6 1d1e G7 R14 1f le 0 1f 1f 4 1e1f G8 R15 1g 1f 0 1g 1h 5 1f1g G9R16 1h 1g 0 1h 1g 6 1g1h G10 R17 0j 1h 0 0j 0j 4 1h0j G11 R18 6 0j 1g 1h1h G12 R19 1g 0j 0 1g 1g 5 0j1g G12 R20 1k 6 0 1k 1k 6 61k G13 R21 1m 1k0 1m 1m 4 1k1m G14 R22 1n 1m 0 1n 1n 5 1m1n G15 R23 EOF

In the example of Table 4, the input pattern is as shown in the columnNew Code in rows R1, R2, R4, R7, R11–R18, R20–R22, with row R23 havingthe special input EOF, which marks the end of the input file. Rows 3,5–6, 8–10, and 19 operate on internally generated input values (shown bythe italicized numbers in the New Code column for these rows) togenerate decompressed output values.

The column New Code contains the values for the externally inputcharacters (referred herein as External New Code) for the decompressioncycles, as well as the values for the internally generated input values(referred herein as Internal New Code) for the decompression cycles. Toclarify, External New Codes represent the compressed data received bythe decompressor from an external source. The Internal New Codesrepresent the interim values generated by the decompressor itself tofacilitate decompression and the generation of the decompressed outputvalues. Both the External New Code and Internal New Code values are 3bits in the example herein.

The columns Old Code and Zero are columns containing intermediate valuesgenerated for the decompression cycles. The column Char Out contains thevalues outputted from the decompressor, which are further processed intothe decompressed output values, as shown in column Decomp Value. OldCode and Zero values are all 3 bits long in the present example, whereasthe Char Out values are 2 bits long.

The dictionary comprises two columns: 1) the Counter column, whichrepresents the address into the RAM, and 2) the Content column, whichrepresents what is stored into the dictionary. The Counter value isgenerated by a counter circuit or software is 3 bits long. As will beseen later during the explanation of the decompression steps, the valueof each entry in the Content column comprises the values from both theOld Code and Char Out columns for the current decompression cycle.Accordingly, each Content value is 5 bits long.

Since the Char Out value is 2 bits, the maximum value of Char Out is11(binary) or 3 (decimal). The counter value is preferably set to belarger than the maximum value of Char Out. In the example of Table 4,the counter value has a range of 4–6 to match the conditions imposedduring compression, with 4 being the MinCounter value and 6 being theMaxCounter value. In general, the counter value range is known to boththe compressor and the decompressor. Initially, the Counter is set to beMinCounter-1, or 3.

The example of Table 4 will be more easily understood with reference toFIG. 13. In row R1, the initial compressed value 1a is received. Again,the subscript “a” and other subscripts are added to help the readerfollow the explanation. They do not exist in implementation. For thisinitial value, similar to the start of the standard LZW decompressiontechnique, a value is outputted (Char Out=New Code or 1a). It should benoted that since Char Out is 2 bits and New Code is 3 bits, it isnecessary to remove the MSB of New Code to form Char Out. For this firstcycle, both the Old Code column and the Zero column are set to 0. Thesesteps are shown in blocks 1302, 1304 and 1306 in FIG. 13.

In row R2, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). If the Zero column of theprevious row (row R1) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 1ain this case. The New Code value for the current row is received, asshown by the value 4 in row R2 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). Since the External New Codeis 4, the method proceeds to block 1316 to ascertain whether the NewCode is in the dictionary. In one embodiment, the determination ofwhether a code is already in the dictionary is as follows. If thecounter value is less than New Code and there is no overflow of thecounter, then it is assumed that the New Code is not in the dictionary.On the other hand, if the counter value is greater than or equal to NewCode or there is an overflow, then the New Code is assumed to be in thedictionary.

Recall that the dictionary is dynamically built for adaptivedecompression. In this case, the address location 4 has not been used,and the method proceeds to block 1318 to set the value in the ZeroColumn to be equal to the Old Code value (or 1a). The Char Out value isset (block 1320) to be equal to the Char Out value of the previous cycle(row R1) or the value 1a. The method then returns to block 1308 as shownin FIG. 13.

With reference to row R3, it is ascertained (block 1308) whether theprevious row (i.e., cycle) has a value 0 in the Zero column. Since theZero column of the previous row (row R2) has the value 1a, the methodproceeds to block 1322 to obtain an Internal New Code, which is equal tothe Zero column value of the previous cycle (row R2). That value is 1aas shown in Table 4. Next, the Old Code value is set (block 1324) to beequal to the value of the Old Code value in the previous cycle (row R2).That value is 1a as shown in Table 4. Next, the method proceeds to block1314 to ascertain whether the New Code for the current cycle (which isan Internal New Code in this case) is less than the MinCounter value(which is 4 in the present example). Since the Internal New Code is 1,the method proceeds to block 1326 to put the value 0 into the Zerocolumn. The Char Out value is then set (block 1328) to be equal toInternal New Code (which is 1a in this case). It should be noted thatsince Char Out is 2 bits and New Code is 3 bits, it is necessary toremove the MSB of New Code to form Char Out. The method then proceeds tostep 1330 to increment the counter from its current value, going from 3to 4. In step 1332, it is ascertained whether the counter has overflowedby the increment step of block 1330. If an overflow occurs, the counteris reset in block 1334. Since the current counter value 4 is not greaterthan MaxCounter (or 6 in this example), the method proceeds to block1336 to store the Content value (Old Code+Char Out) into the addresslocation specified by Counter. Thus, the value 11 is stored into addresslocation 4 for row R3. For ease of explanation, these have been markedwith subscript 1x1y in Table 4 (with the subscript having no meaning inactual implementation as they are merely an explanation aide). Themethod then returns to block 1308 as shown in FIG. 13.

In row R4, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R3) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 4in this case. The New Code value for the current row is received, asshown by the value 5 in row R4 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). Since the External New Codeis 5, the method proceeds to block 1316 to ascertain whether this NewCode is in the dictionary. In this case, the address location 5 has notbeen used, and the method proceeds to block 1318 to set the value in theZero Column to be equal to the Old Code value (or 4). The Char Out valueis set (block 1320) to be equal to the Char Out value of the previouscycle (row R3) or the value 1a. The method then returns to block 1308 asshown in FIG. 13.

With reference to row R5, it is ascertained (block 1308) whether theprevious row (i.e., cycle) has a value 0 in the Zero column. Since theZero column of the previous row (row R4) has the value 4, the methodproceeds to block 1322 to obtain an Internal New Code, which is equal tothe Zero column value of the previous cycle (row R4). That value is 4 asshown in Table 4. Next, the Old Code value is set (block 1324) to beequal to the value of the Old Code value in the previous cycle (row R4).That value is 4 as shown in Table 4. Next, the method proceeds to block1314 to ascertain whether the New Code for the current cycle (which isan Internal New Code in this case) is less than the MinCounter value(which is 4 in the present example). Since the Internal New Code is 4,the method proceeds to block 1316 to ascertain whether the Internal NewCode is in the dictionary, i.e., whether the address location 4 (whichis the value of the Internal New Code) has been used. Since addresslocation 4 was employed to store the value 1x1y in row R3, the methodproceed to block 1340 to find the content of the dictionary entry whoseaddress is New Node value (or 4 in this cycle). The first 3 bits of theContent value (previously the Old Code portion of row R3) is parsed andassigned to the Zero column of row R5 (block 1342). The last 2 bits ofthe Content value (previously the Char Out portion of row R3) is parsedand assigned to the Char Out column of row R5 (block 1344). Thus, thevalue 1x is assigned to the Zero column. The Char Out column is assignedvalue 1y. The method then returns to block 1308 as shown in FIG. 13.

With reference to row R6, it is ascertained (block 1308) whether theprevious row (i.e., cycle) has a value 0 in the Zero column. Since theZero column of the previous row (row R5) has the value 1x, the methodproceeds to block 1322 to obtain an Internal New Code, which is equal tothe Zero column value of the previous cycle (row R5). That value is 1xas shown in Table 4. Next, the Old Code value is set (block 1324) to beequal to the value of the Old Code value in the previous cycle (row R5).That value is 4 as shown in Table 4. Next, the method proceeds to block1314 to ascertain whether the New Code for the current cycle (which isan Internal New Code in this case) is less than the MinCounter value(which is 4 in the present example). Since the Internal New Code is 1,which is a primary case (i.e., the value of the Internal New Code isless than the minimum code value, or 4 in this example since the codesare either 4, 5, or 6 as discussed earlier), the method proceeds toblock 1326 to put the value 0 into the Zero column. The Char Out valueis then set (block 1328) to be equal to New Code (which is 1x in thiscase). It should be noted that since Char Out is 2 bits and New Code is3 bits, it is necessary to remove the MSB of New Code to form Char Out.The method then proceeds to block 1330 to increment the counter from itscurrent value, going from 4 to 5. In block 1332, it is ascertainedwhether the counter has overflowed by the increment step of block 1330.If an overflow occurs, the counter is reset in block 1334. Since thecurrent counter value 5 is not greater than MaxCounter (or 6 in thisexample), the method proceeds to block 1336 to store the Content value(Old Code+Char Out) into the address location specified by Counter.Thus, the value 41 is stored into address location 5 for row R6. Forease of explanation, these have been marked with subscript 41z in Table4 (with the subscript having no meaning in actual implementation as theyare merely an explanation aide). The method then returns to block 1308as shown in FIG. 13.

In row R7, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R6) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 5in this case. The New Code value for the current row is received, asshown by the value 6 in row R7 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). This is equivalent tochecking whether the New Code for the current cycle is a primary value.Since the External New Code is 6, the method proceeds to block 1316 toascertain whether this New Code is in the dictionary. In this case, theaddress location 6 has not been used, and the method proceeds to block1318 to set the value in the Zero Column to be equal to the Old Codevalue (or 5). The Char Out value is set (block 1320) to be equal to theChar Out value of the previous cycle (row R3) or the value 1x. Themethod then returns to block 1308 as shown in FIG. 13.

With reference to row R8, it is ascertained (block 1308) whether theprevious row (i.e., cycle) has a value 0 in the Zero column. Since theZero column of the previous row (row R7) has the value 5, the methodproceeds to block 1322 to obtain an Internal New Code, which is equal tothe Zero column value of the previous cycle (row R7). That value is 5 asshown in Table 4. Next, the Old Code value is set (block 1324) to beequal to the value of the Old Code value in the previous cycle (row R7).That value is 5 as shown in Table 4. Next, the method proceeds to block1314 to ascertain whether the New Code for the current cycle (which isan Internal New Code in this case) is less than the MinCounter value(which is 4 in the present example). Since the Internal New Code is 5,the method proceeds to block 1316 to ascertain whether the Internal NewCode is in the dictionary, i.e., whether the address location 5 (whichis the value of the Internal New Code) has been used. Since addresslocation 5 was employed to store the value 41z in row R6, the methodproceed to block 1340 to find the content of the dictionary entry whoseaddress is New Code value (or 5 in this cycle). The first 3 bits of theContent value (previously the Old Code portion of row R3) is parsed andassigned to the Zero column of row R8 (block 1342). The last 2 bits ofthe Content value (previously the Char Out portion of row R3) is parsedand assigned to the Char Out column of row R8 (block 1344). Thus, thevalue 4 is assigned to the Zero column. The Char Out column is assignedvalue 1z. The method then returns to block 1308 as shown in FIG. 13.

With reference to row R9, it is ascertained (block 1308) whether theprevious row (i.e., cycle) has a value 0 in the Zero column. Since theZero column of the previous row (row R4) has the value 4, the methodproceeds to block 1322 to obtain an Internal New Code, which is equal tothe Zero column value of the previous cycle (row R4). That value is 4 asshown in Table 4. Next, the Old Code value is set (block 1324) to beequal to the value of the Old Code value in the previous cycle (row R8).That value is 5 as shown in Table 4. Next, the method proceeds to block1314 to ascertain whether the New Code for the current cycle (which isan Internal New Code in this case) is less than the MinCounter value(which is 4 in the present example). Since the Internal New Code is 4,the method proceeds to block 1316 to ascertain whether the Internal NewCode is in the dictionary, i.e., whether the address location 4 (whichis the value of the Internal New Code) has been used. Since addresslocation 4 was employed to store the value 1x1y in row R3, the methodproceed to block 1340 to find the content of the dictionary entry whoseaddress is New Node value (or 4 in this cycle). The first 3 bits of theContent value (previously the Old Code portion of row R3) is parsed andassigned to the Zero column of row R9 (block 1342). The last 2 bits ofthe Content value (previously the Char Out portion of row R3) is parsedand assigned to the Char Out column of row R9 (block 1344). Thus, thevalue 1x is assigned to the Zero column. The Char Out column is assignedvalue 1y. The method then returns to block 1308 as shown in FIG. 13.

With reference to row RIO, it is ascertained (block 1308) whether theprevious row (i.e., cycle) has a value 0 in the Zero column. Since theZero column of the previous row (row R5) has the value 1x, the methodproceeds to block 1322 to obtain an Internal New Code, which is equal tothe Zero column value of the previous cycle (row R9). That value is 1xas shown in Table 4. Next, the Old Code value is set (block 1324) to beequal to the value of the Old Code value in the previous cycle (row R9).That value is 5 as shown in Table 4. Next, the method proceeds to block1314 to ascertain whether the New Code for the current cycle (which isan Internal New Code in this case) is less than the MinCounter value(which is 4 in the present example). Since the Internal New Code is 1x,the method proceeds to block 1326 to put the value 0 into the Zerocolumn. The Char Out value is then set (block 1328) to be equal to NewCode (which is 1x in this case). It should be noted that since Char Outis 2 bits and New Code is 3 bits, it is necessary to remove the MSB ofNew Code to form Char Out. The method then proceeds to block 1330 toincrement the counter from its current value, going from 5 to 6. Inblock 1332, it is ascertained whether the counter has overflowed by theincrement step of block 1330. If an overflow occurs, the counter isreset in block 1334. Since the current counter value 6 is not greaterthan MaxCounter (or 6 in this example), the method proceeds to block1336 to store the Content value (Old Code+Char Out) into the addresslocation specified by Counter. Thus, the value 51 is stored into addresslocation 6 for row RIO. The method then returns to block 1308 as shownin FIG. 13.

In row R11, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R10) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 6in this case. The New Code value for the current row is received, asshown by the value 1c in row R11 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). Since the External New Codeis 1c, the method proceeds to block 1326 to put the value 0 into theZero column. The Char Out value is then set (block 1328) to be equal toNew Code (which is 1c in this case). It should be noted that since CharOut is 2 bits and New Code is 3 bits, it is necessary to remove the MSBof New Code to form Char Out. The method then proceeds to block 1330 toincrement the counter from its current value, going from 6 to 7. Inblock 1332, it is ascertained whether the counter has overflowed by theincrement step of block 1330. If an overflow occurs, the counter isreset in block 1334. Since the current counter value 7 is greater thanMaxCounter (or 6 in this example), the method resets the counter toMinCounter (or 4 in this example). The method then proceeds to block1336 to store the Content value (Old Code+Char Out) into the addresslocation specified by Counter. Thus, the value 61 is stored into addresslocation 4 for row RIO. Note that in this case, the counter hasoverflowed and the method simply overwrites the address location 4(previously used to store the Content value 1x1y in row R3). As will beseen later in this example, this overwriting of the old dictionaryentry, while allowing the use of a much smaller RAM to implement thedictionary, still gives the correct decompression result. The methodthen returns to block 1308 as shown in FIG. 13.

In row R12, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R11) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 1cin this case. The New Code value for the current row is received, asshown by the value 1d in row R12 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). Since the External New Codeis 1d, the method proceeds to block 1326 to put the value 0 into theZero column. The Char Out value is then set (block 1328) to be equal toNew Code (which is 1d in this case). It should be noted that since CharOut is 2 bits and New Code is 3 bits, it is necessary to remove the MSBof New Code to form Char Out. The method then proceeds to block 1330 toincrement the counter from its current value, going from 4 to 5. Inblock 1332, it is ascertained whether the counter has overflowed by theincrement step of block 1330. If an overflow occurs, the counter isreset in block 1334. Since the current counter value 5 is not greaterthan MaxCounter (or 6 in this example), the method proceeds to block1336 to store the Content value (Old Code+Char Out) into the addresslocation specified by Counter. Thus, the value 1c1d is stored intoaddress location 5 for row R12. Again, note that in this case, themethod simply overwrites the address location 5 (previously used tostore the Content value 41z in row R6). As will be seen later in thisexample, this overwriting of the old dictionary entry, while allowingthe use of a much smaller RAM to implement the dictionary, still givesthe correct decompression result. The method then returns to block 1308as shown in FIG. 13.

In row R13, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R12) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 1din this case. The New Code value for the current row is received, asshown by the value 1e in row R13 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). Again, this is equivalent tochecking whether the New Code is a primary case. Since the External NewCode is 1e, the method proceeds to block 1326 to put the value 0 intothe Zero column. The Char Out value is then set (block 1328) to be equalto New Code (which is 1e in this case). It should be noted that sinceChar Out is 2 bits and New Code is 3 bits, it is necessary to remove theMSB of New Code to form Char Out. The method then proceeds to block 1330to increment the counter from its current value, going from 5 to 6. Inblock 1332, it is ascertained whether the counter has overflowed by theincrement step of block 1330. If an overflow occurs, the counter isreset in block 1334. Since the current counter value 6 is not greaterthan MaxCounter (or 6 in this example), the method proceeds to block1336 to store the Content value (Old Code+Char Out) into the addresslocation specified by Counter. Thus, the value Idle is stored intoaddress location 6 for row R13. Again, note that in this case, themethod simply overwrites the address location 6 (previously used tostore the Content value 51 in row R10). As will be seen later in thisexample, this overwriting of the old dictionary entry, while allowingthe use of a much smaller RAM to implement the dictionary, still givesthe correct decompression result. The method then returns to block 1308as shown in FIG. 13.

In row R14, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R13) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 1ein this case. The New Code value for the current row is received, asshown by the value If in row R14 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). Since the External New Codeis If, the method proceeds to block 1326 to put the value 0 into theZero column. The Char Out value is then set (block 1328) to be equal toNew Code (which is If in this case). It should be noted that since CharOut is 2 bits and New Code is 3 bits, it is necessary to remove the MSBof New Code to form Char Out. The method then proceeds to block 1330 toincrement the counter from its current value, going from 6 to 7. Inblock 1332, it is ascertained whether the counter has overflowed by theincrement step of block 1330. If an overflow occurs, the counter isreset in block 1334. In this example, the value 7 is reserved for theEOF flag and thus the maximum value of the code is 6 although thetheoretical maximum value of the code would have been 7 (due to its3-bit length). Using the maximum theoretical value to represent the EOFflag is one convenient way of handling EOF flagging. Since the currentcounter value 7 is greater than MaxCounter (or 6 in this example), themethod resets the counter to MinCounter (or 4 in this example). Themethod then proceeds to block 1336 to store the Content value (OldCode+Char Out) into the address location specified by Counter. Thus, thevalue 1e1f is stored into address location 4 for row R14. Note that inthis case, the counter has overflowed and the method simply overwritesthe address location 4 (previously used to store the Content value 6l inrow R11). As will be seen later in this example, this overwriting of theold dictionary entry, while allowing the use of a much smaller RAM toimplement the dictionary, still gives the correct decompression result.The method then returns to block 1308 as shown in FIG. 13.

In row R15, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R14) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is ifin this case. The New Code value for the current row is received, asshown by the value 1g in row R15 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). Since the External New Codeis 1g, the method proceeds to block 1326 to put the value 0 into theZero column. The Char Out value is then set (block 1328) to be equal toNew Code (which is 1g in this case). It should be noted that since CharOut is 2 bits and New Code is 3 bits, it is necessary to remove the MSBof New Code to form Char Out. The method then proceeds to block 1330 toincrement the counter from its current value, going from 4 to 5. Inblock 1332, it is ascertained whether the counter has overflowed by theincrement step of block 1330. If an overflow occurs, the counter isreset in block 1334. Since the current counter value 5 is not greaterthan MaxCounter (or 6 in this example), the method proceeds to block1336 to store the Content value (Old Code+Char Out) into the addresslocation specified by Counter. Thus, the value 1f1g is stored intoaddress location 5 for row R15. Again, note that in this case, themethod simply overwrites the address location 5 (previously used tostore the Content value 1c1d in row R12). As will be seen later in thisexample, this overwriting of the old dictionary entry, while allowingthe use of a much smaller RAM to implement the dictionary, still givesthe correct decompression result. The method then returns to block 1308as shown in FIG. 13.

In row R16, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R15) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 1gin this case. The New Code value for the current row is received, asshown by the value 1h in row R13 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). Since the External New Codeis 1h, the method proceeds to block 1326 to put the value 0 into theZero column. The Char Out value is then set (block 1328) to be equal toNew Code (which is 1h in this case). It should be noted that since CharOut is 2 bits and New Code is 3 bits, it is necessary to remove the MSBof New Code to form Char Out. The method then proceeds to block 1330 toincrement the counter from its current value, going from 5 to 6. Inblock 1332, it is ascertained whether the counter has overflowed by theincrement step of block 1330. If an overflow occurs, the counter isreset in block 1334. Since the current counter value 6 is not greaterthan MaxCounter (or 6 in this example), the method proceeds to block1336 to store the Content value (Old Code+Char Out) into the addresslocation specified by Counter. Thus, the value 1g1h is stored intoaddress location 6 for row R16. Again, note that in this case, themethod simply overwrites the address location 6 (previously used tostore the Content value Idle in row R13). As will be seen later in thisexample, this overwriting of the old dictionary entry, while allowingthe use of a much smaller RAM to implement the dictionary, still givesthe correct decompression result. The method then returns to block 1308as shown in FIG. 13.

In row R17, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R16) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 1hin this case. The New Code value for the current row is received, asshown by the value 0j in row R17 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). This is the same is checkingwhether the current New Code is a primary case. Since the External NewCode is 0j, the method proceeds to block 1326 to put the value 0 intothe Zero column. The Char Out value is then set (block 1328) to be equalto New Code (which is 0j in this case). It should be noted that sinceChar Out is 2 bits and New Code is 3 bits, it is necessary to remove theMSB of New Code to form Char Out. The method then proceeds to block 1330to increment the counter from its current value, going from 6 to 7. Inblock 1332, it is ascertained whether the counter has overflowed by theincrement step of block 1330. If an overflow occurs, the counter isreset in block 1334. Since the current counter value 7 is greater thanMaxCounter (or 6 in this example), the method resets the counter toMinCounter (or 4 in this example). The method then proceeds to block1336 to store the Content value (Old Code+Char Out) into the addresslocation specified by Counter. Thus, the value 1h0j is stored intoaddress location 4 for row R14. Note that in this case, the counter hasoverflowed and the method simply overwrites the address location 4(previously used to store the Content value 1e1f in row R14). As will beseen later in this example, this overwriting of the old dictionaryentry, while allowing the use of a much smaller RAM to implement thedictionary, still gives the correct decompression result. The methodthen returns to block 1308 as shown in FIG. 13.

In row R18, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R17) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 0jin this case. The New Code value for the current row is received, asshown by the value 6 in row R18 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is less than the MinCountervalue (which is 4 in the present example). Since the External New Codeis 6, the method proceeds to block 1316 to ascertain whether theInternal New Code is in the dictionary, i.e., whether the addresslocation 6 (which is the value of the Internal New Code) has been used.Since address location 6 was employed to store the value 1g1h in rowR16, the method proceed to block 1340 to find the content of thedictionary entry whose address is New Node value (or 6 in this cycle).The first 3 bits of the Content value (previously the Old Code portionof row R16) is parsed and assigned to the Zero column of row R18 (block1342). The last 2 bits of the Content value (previously the Char Outportion of row R16) is parsed and assigned to the Char Out column of rowR18 (block 1344). Thus, the value 1g is assigned to the Zero column. TheChar Out column is assigned value 1h. Note that the method stilldecompresses correctly even if the address location 6 had been writtenover a few times. Note that the counter is not incremented in this cyclebecause the zero column is not zero. The method then returns to block1308 as shown in FIG. 13.

In row R19, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R18) has the value 1g, the method proceeds to block1322 to obtain an Internal New Code, which is equal to the Zero columnvalue of the previous cycle (row R18). That value is 1g as shown inTable 4. Next, the Old Code value is set (block 1324) to be equal to thevalue of the Old Code value in the previous cycle (row R18). That valueis 0j as shown in Table 4. Next, the method proceeds to block 1314 toascertain whether the New Code for the current cycle (which is anInternal New Code in this case) is less than the MinCounter value (whichis 4 in the present example). Since the Internal New Code is 1g, themethod proceeds to block 1326 to put the value 0 into the Zero column.The Char Out value is then set (block 1328) to be equal to New Code(which is 1g in this case). It should be noted that since Char Out is 2bits and New Code is 3 bits, it is necessary to remove the MSB of NewCode to form Char Out. The method then proceeds to block 1330 toincrement the counter from its current value, going from 4 to 5. Inblock 1332, it is ascertained whether the counter has overflowed by theincrement step of block 1330. If an overflow occurs, the counter isreset in block 1334. Since the current counter value 5 is not greaterthan MaxCounter (or 6 in this example), the method proceeds to block1336 to store the Content value (Old Code+Char Out) into the addresslocation specified by Counter. Thus, the value 0l is stored into addresslocation 5 for row R19. The method then returns to block 1308 as shownin FIG. 13.

In row R20, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R19) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 6in this case (from row R18). The New Code value for the current row isreceived, as shown by the value 1k in row R20 of Table 4 (block 1312).It is then ascertained (block 1314) whether the New Code for the currentcycle (which is an External New Code in this case) is primary, i.e.,less than the MinCounter value (which is 4 in the present example).Since the External New Code is 1k, the method proceeds to block 1326 toput the value 0 into the Zero column. The Char Out value is then set(block 1328) to be equal to New Code (which is 1k in this case). Itshould be noted that since Char Out is 2 bits and New Code is 3 bits, itis necessary to remove the MSB of New Code to form Char Out. The methodthen proceeds to block 1330 to increment the counter from its currentvalue, going from 5 to 6. In block 1332, it is ascertained whether thecounter has overflowed by the increment step of block 1330. If anoverflow occurs, the counter is reset in block 1334. Since the currentcounter value 6 is not greater than MaxCounter (or 6 in this example),the method proceeds to block 1336 to store the Content value (OldCode+Char Out) into the address location specified by Counter. Thus, thevalue 61k is stored into address location 6 for row R20. Again, notethat in this case, the method simply overwrites the address location 6(previously used to store the Content value 1g1h in row R16). As will beseen later in this example, this overwriting of the old dictionaryentry, while allowing the use of a much smaller RAM to implement thedictionary, still gives the correct decompression result. The methodthen returns to block 1308 as shown in FIG. 13.

In row R21, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R20) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 1kin this case. The New Code value for the current row is received, asshown by the value 1m in row R21 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is primary, i.e., less thanthe MinCounter value (which is 4 in the present example). Since theExternal New Code is 1m, the method proceeds to block 1326 to put thevalue 0 into the Zero column. The Char Out value is then set (block1328) to be equal to New Code (which is 1m in this case). It should benoted that since Char Out is 2 bits and New Code is 3 bits, it isnecessary to remove the MSB of New Code to form Char Out. The methodthen proceeds to block 1330 to increment the counter from its currentvalue, going from 6 to 7. In block 1332, it is ascertained whether thecounter has overflowed by the increment step of block 1330. If anoverflow occurs, the counter is reset in block 1334. Since the currentcounter value 7 is greater than MaxCounter (or 6 in this example), themethod resets the counter to MinCounter (or 4 in this example). Themethod then proceeds to block 1336 to store the Content value (OldCode+Char Out) into the address location specified by Counter. Thus, thevalue 1k1m is stored into address location 4 for row R21. Note that inthis case, the counter has overflowed and the method simply overwritesthe address location 4 (previously used to store the Content value 1h0jin row R17). As will be seen later in this example, this overwriting ofthe old dictionary entry, while allowing the use of a much smaller RAMto implement the dictionary, still gives the correct decompressionresult. The method then returns to block 1308 as shown in FIG. 13.

In row R22, it is ascertained whether the previous row (i.e., cycle) hasa value 0 in the Zero column (block 1308). Since the Zero column of theprevious row (row R14) has the value 0, the Old Code column is set to beequal to the previous External New Code value (block 1310), which is 1min this case. The New Code value for the current row is received, asshown by the value 1n in row R15 of Table 4 (block 1312). It is thenascertained (block 1314) whether the New Code for the current cycle(which is an External New Code in this case) is primary, i.e., less thanthe MinCounter value (which is 4 in the present example). Since theExternal New Code is 1n, the method proceeds to block 1326 to put thevalue 0 into the Zero column. The Char Out value is then set (block1328) to be equal to New Code (which is 1n in this case). It should benoted that since Char Out is 2 bits and New Code is 3 bits, it isnecessary to remove the MSB of New Code to form Char Out. The methodthen proceeds to block 1330 to increment the counter from its currentvalue, going from 4 to 5. In block 1332, it is ascertained whether thecounter has overflowed by the increment step of block 1330. If anoverflow occurs, the counter is reset in block 1334. Since the currentcounter value 5 is not greater than MaxCounter (or 6 in this example),the method proceeds to block 1336 to store the Content value (OldCode+Char Out) into the address location specified by Counter. Thus, thevalue 1m1n is stored into address location 5 for row R22. Again, notethat in this case, the method simply overwrites the address location 5(previously used to store the Content value 0l in row R19). As will beseen later in this example, this overwriting of the old dictionaryentry, while allowing the use of a much smaller RAM to implement thedictionary, still gives the correct decompression result. The methodthen returns to block 1308 as shown in FIG. 13.

In row 23, the EOF end-of-file marker is encountered. Decompression isfinished except for final processing, as discussed below.

As mentioned earlier, the values in the Char Out column are furtherprocessed in order to obtain the decompressed output value (DecompOutput). In one embodiment, the value in the zero column signals thatdecompression for the current External New Code value is finished. Sincedecompression may yield a set of output values for each External NewCode value received, each set of output values produced for eachExternal New Code value received is considered a group. These groups areshown in Table 4 as groups G1–G15 as shown. Note that groups G2, G3, G4,and G12 have multiple values in each group. As the Char Out values areobtained for each group, they are inputted into a temporary memoryspace. Once decompression is finished for that group, the Char Outvalues for that group are outputted in the reverse order such that theChar Out value received first is output last, and vice-versa. Withreference to the group G3, for example, the Char Out values are producedin the order 1a, 1y, and 1x. Outputting to column Decomp Output isaccomplished for this group G3 by reversing the order so that the ordernow reads 1x, 1y, and 1a for rows R4, R5, and R6 respectively.Similarly, the group G4 is reversed to output, the Decomp Output column,the values 1x, 1y, 1z, and 1x for rows R7, R8, R9, and R10 respectively.One skilled in the art will readily recognize that reversing the orderfor each group may be accomplished using any technique, including usinga First-In-Last-Out queue.

With reference to Tables 3 and 4, when one ignores the subscripts, itshould be apparent that the Decomp Output values of Table 4 is identicalto the values in the Input Character column of Table 3. This is proofthat the improved HSO compression technique of the present invention cancorrectly compress and decompress data even with a small dictionary.

It should be kept in mind that although the input characters in theexample of Tables 3 and 4 are 2 bits each, the inventive HSO compressiontechnique can compress input characters having any size. As can beappreciated from the foregoing, the inventive HSO technique has manyunique features and advantages. For example, the invention allows theuse of a smaller number of output codes for compression, much fewer thanthe number of codes that would have been required if one unique codewere allocated for a unique bit pattern that needs to be representedduring compression. As the code overflows, it resets to its minimumvalue to allow compression to continue. This is seen during, forexample, the compression of row R11 in the compression example of Table3. The reduction in the number of unique output codes required in turnallows the use of a smaller dictionary during compression, whichadvantageously reduces the memory requirement, which is an advantageoussituation whether the compression logic is implemented in hardware orsoftware.

Another unique feature in the inventive combination that is the presentinventive HSO compression technique relates to the use of a small shadowmemory to track the associated pairings between a CAM content value andits associated CAM address to allow a previously used CAM address to befreed up when the counter (code output) overflows the imposed maximumvalue. As discussed earlier, the shadow memory addresses are the countervalues, and the content stored at each address in the shadow memory isthe CAM address currently used to store the counter value that forms theshadow memory address. This shadow memory advantageously allows the CAMaddress to be rapidly ascertained for any given counter value. This isseen, for example, during the compression of row R11 in the compressionexample of Table 3. The use of the shadow memory advantageously makesthe compression process more efficient when a smaller number of outputcodes, much fewer than the number of codes that would have been requiredif one unique code were allocated for a unique bit pattern that needs tobe represented during compression.

Another unique feature in the inventive combination that is the presentinventive HSO compression technique relates to the way the special casesare handled when it is realized that the current search bit pattern isthe same as the search bit pattern that is associated with the nextinput character during compression. This is seen during the compressionof, for example, R12, R13, R14, R15, and R21 R11 in the compressionexample of Table 3. When a special case is encountered, the inventiveHSO compression technique simply increments the counter (if such doesnot cause the counter to overflow) and outputs the first part of thecurrent search bit pattern. Neither the CAM nor the shadow memory isupdated, which saves processing cycles.

To enable the handling of the special cases during compression, theinventive HSO compression technique, in one embodiment, advantageouslyemploys input buffer and a pipelined input structure in order to havemultiple input characters available for examination and detection of thespecial cases. This is also another unique feature in combination withother features of the inventive HSO compression technique.

Even the CAM is structured in a unique, memory-saving manner thatensures processing efficiency. In one embodiment, the CAM only storesthe counter values (output codes), with the CAM address representing thecurrent bit pattern to search. To signal whether a given CAM address isemployed or free, one or more tag bits may be provided with each CAMaddress location. One tag bit suffices to indicate whether a given CAMaddress is used. In one embodiment, multiple tag bits allow the tag bitsto be cycled through when the dictionary is reused for compressing thenext burst. For example, at the end of compression of a particularburst, the dictionary is then cleared for compressing the next burst(which may belong to another process and/or data flow). If a CAM isfurnished with, for example two tag bits T1 and T2 for each CAM addressto mark whether the CAM address is currently used, and tag bit T1 wasused in the compression of the previous burst, the CAM can be usedimmediately for compression of the next burst by utilizing tag bit T2.Of course, it is possible to provide more than two tag bit fields ifdesired for higher bandwidth. Alternatively or additionally, multipleCAM arrays (with one or multiple tag bit fields) can be provided. TheCAMs can be employed in a ping-pong fashion to store the dictionariesassociated with consecutive input sequences. Thus, if two CAMs areprovided, the first CAM will be used to store the dictionary associatedwith the first input sequence, the second CAM will be used to store thedictionary associated with the second input sequence (in this sense, theinput sequence refers to a pattern of incoming bits comprising one ormore frames or packets and associated with a single flow or file and canbe compressed together), and the first CAM will be used again to storethe dictionary associated with next (third) input sequence, and so on(e.g., the second CAM used for the dictionary associated with the next(fourth) input sequence). When one CAM is currently employed for storingthe dictionary, the other CAM can be reset (e.g., by rewriting the tagfield or tag fields) to get that CAM ready for use with the next inputsequence. Thus, the compression process does not have to be interruptedin order to reset a CAM. One skilled in the art will recognize thatthree or more CAMs can be used in a round-robin fashion to achieve thesame purpose if two CAMs cannot satisfy the bandwidth requirement.

In one embodiment, the end of burst (EOF) is signaled to the compressionlogic using a unique bit pattern. This end of burst signal may becreated by, for example, the input interface of the data optimizationengine. The input interface is endowed with knowledge regarding theprotocol employed to transmit the data and therefore would know wherethe burst ends and where the next burst begins in the data stream. Byusing a special end of burst (EOF) signal, it is unnecessary for thecompression engine to know in advance how long the burst is. This allowscompression to be truly flexible and adaptive with regard to how longthe burst can be, further extending the flexibility of the inventive HSOcompression technique (which is flexible and adaptive with regard towhat type of data is received).

With regard to the decompression logic, the ability to use a smallnumber of address locations in the dictionary to decompressadvantageously allows the dictionary to be quite small. In the exampleof Table 4, for example, the dictionary has only three addresses: 4, 5,and 6. Unique in the combination that is the inventive HSO decompressiontechnique is the ability to overwrite existing memory locations when thecounter overflows. This overwrite feature is seen, for example, duringthe decompression of row R11 in the example of Table 4 when the counteroverflows and is reset to 4. In this case, the address location 4 issimply overwritten with the new content value.

The overwrite ability and the use of the counter value as addresses intothe decompression dictionary allow the inventive decompression logic tobe implemented with a minimal memory requirement, which is advantageousirrespective whether the decompression logic is implemented in softwareor hardware. Minimizing the memory requirement both increases theprocessing speed and reducing complexity/size of the decompressionlogic. In one embodiment, the reduction in the size of the memory allowsthe decompression dictionary to be implemented using simply randomaccess memory (RAM), with the attendant benefit in higher speed,reducing complexity and power consumption. The smaller memoryrequirement also makes it possible to design the dictionary memory usingspecial high speed custom logic in an economical manner, whichfacilitates high speed decompression to keep up with higher data raterequirements.

In one embodiment, the end of burst (EOF) is signaled to thedecompression logic using a unique bit pattern. This end of burst signalmay be created by, for example, the input interface of the dataoptimization engine or the decompression engine may simply utilize theend of burst (EOF) signal provided by the compression circuitry when thepacket or data frame was compressed earlier. As in the compression case,the input interface is endowed with knowledge regarding the protocolemployed to transmit the data and therefore would know where the burstends and where the next burst begins in the data stream. By using aspecial end of burst (EOF) signal, it is unnecessary for thedecompression engine to know in advance how long the burst is. Thisallows decompression to be truly flexible and adaptive with regard tohow long the burst can be, further extending the flexibility of theinventive HSO decompression technique (which is flexible and adaptive inthat no prior knowledge of the dictionary is required for decompressionof any type of compressed data).

Also unique in the combination that is the inventive HSO decompressiontechnique is the reshuffling feature that allows the output to beproperly ordered to restore the original uncompressed stream. Withreference to the example of Table 4, this reshuffling process is seenwithin each group G1–G15, which process reshuffles the values CharOutvalues to derive the Decomp Output. As can be seen by a comparison withthe compression input stream, Decomp Output is an exact copy of theoriginal uncompressed data stream.

FIG. 14 shows, in accordance with one embodiment of the presentinvention, a data optimization engine 1402, which receives an incomingdata stream on a communication channel 1404A, optimizes the optimizabledata frames in the incoming data stream, and passes the optimized dataframes, along with the data frames that cannot be optimized, out via acommunication channel 1404B. In the reverse direction, data optimizationengine 1402 receives an incoming data stream on a communication channel1406A that may contain data frames previously optimized. Dataoptimization engine 1402 then de-optimizes the previously optimized dataframes in the incoming data stream received at communication channel1406A, and passes the de-optimized data out on a communication channel1406B. Furthermore, data frames previously unoptimized are bypasseddirectly from communication channel 1406A to communication channel 1406Bby data optimization engine 1402.

In FIG. 14, data optimization engine 1402 comprises a transmit interfacecircuit 1408, an optimization processor 1410, and a receive interfacecircuit 1412. Transmit interface circuit 1408 couples on the left side(transmit side) of FIG. 14 to a transmit-side SERDES(Serializer/Deserializer) 1420, and on the right side of FIG. 14(receive side) to a receive side SERDES 1422. Transmit side SERDES 1420receives the serial incoming data stream on communication channel 1404A,and converts the incoming serial data to a parallel data format to betransmitted to transmit interface circuit 1408 via a 10-bit bus 1424.Transmit interface circuit 1408 performs data alignment on the dataframes of the incoming data stream, separates the optimizable dataframes from the non-optimizable data frames, and bypasses thenon-optimizable data frames out to receive side SERDES 1422 to be outputon communication channel 1404B. Transmit interface circuit 1408 alsoperforms data parsing on the optimizable data frames in the incomingdata stream (received on communication channel 1404A), thus separatingthe optimizable portion of a data frame from the non-optimizableportion. The data in the optimizable portion is then translated oradapted by transmit interface circuit 1408 to a protocol or format thatis suitable for optimization by optimization processor 1410.

With reference to FIG. 14, the optimizable portion of the optimizabledata frame is sent from transmit interface circuit 1408 to optimizationprocessor 1410 via a bus 1426. After the optimizable portion of the dataframe is optimized, the now-optimized optimizable portion is received attransmit interface circuit 1408 via a bus 1430 to be reassembled bytransmit interface circuit 1408 with the non-optimizable portions of theoptimizable data frame for retransmission onward, via a bus 1428, toreceive side SERDES 1422 and communication channel 1404B.

Furthermore, transmit interface circuit 1408 performs congestion controlto ensure that if incoming data frames arrive in rapid bursts oncommunication channel 1404A, optimization processor 1410 is not swamped,and can have time to perform the optimization task on the optimizableportions of the optimizable data frames. While optimization processor1410 performs its optimization task on the optimizable portion of theoptimizable data frames, transmit interface circuit 1408 also performstraffic handling to ensure that meaningful data appears on communicationchannel 1404B (if required by the protocol on the communication channel1404B) so as to render data optimization engine transparent to thereceiving device.

Receive interface circuit 1412 couples on the left side (transmit side)of FIG. 14 to a transmit-side SERDES 1460, and on the right side of FIG.14 (receive side) to a receive-side SERDES 1462. Receive-side SERDES1462 receives the serial incoming data stream on communication channel1406A, and converts the incoming serial data to a parallel data formatto be transmitted to receive interface circuit 1412 via a 10-bit bus1464. The incoming data stream may contain both non-optimized dataframes, as well as data frames previously optimized by another dataoptimization engine.

Receive interface circuit 1412 performs data alignment on the dataframes of the incoming data stream, separates the de-optimizable dataframes (i.e., those previously optimized and now need to be decompressedand/or decrypted) from those that do not need de-optimization, andbypasses those data frames that do not need de-optimization out totransmit side SERDES 1460 to be output on communication channel 1406B.Receive interface circuit 1412 also performs data parsing on thede-optimizable data frames in the incoming data stream (received oncommunication channel 1406A), thus separating the de-optimizable portionof a data frame from the non-de-optimizable portion. The data in thede-optimizable portion is then translated or adapted by receiveinterface circuit 1412 to a protocol or format that is suitable forde-optimization by optimization processor 1410 (which performs thede-optimization for data received from receive interface circuit 1412 asdiscussed later herein).

With reference to FIG. 14, the de-optimizable portion of thede-optimizable data frame is sent from receive interface circuit 1412 tooptimization processor 1410 via a bus 1490. After the de-optimizableportion of the data frame is de-optimized, the now-de-optimized portionis received at receive interface circuit 1412 via a bus 1492 to bereassembled by receive interface circuit 1412 with thenon-de-optimizable portion of the de-optimizable data frame forretransmission onward, via a bus 1466, to transmit side SERDES 1460 andcommunication channel 1406B. Furthermore, receive interface circuit 1412performs congestion control to ensure that if incoming data framesarrive in rapid bursts on communication channel 1406A, optimizationprocessor 1410 is not swamped, and can have time to perform thede-optimization task on the de-optimizable portions of thede-optimizable data frames. While optimization processor 1410 performsits de-optimization tasks on the de-optimizable portion of thede-optimizable data frames, receive interface circuit 1412 also performstraffic handling to ensure that meaningful data appears on communicationchannel 1406B (if required by the protocol on the communication channel1406B) so as to render data optimization engine transparent to thereceiving device.

In the following figures, a data optimization engine configured tooptimize data having the Fiber Channel (FC) protocol is discussed indetail. To facilitate discussion of the Fiber Channel implementation ofdata optimization engine 1402, a review of the frame format of a FiberChannel data frame may be in order. Referring now to FIG. 15, there isshown a typical Fiber Channel data frame 1502. Adjacent Fiber Channeldata frames 1502 are typically separated from one another by one or moreprimitive signals (an Idle word is a type of primitive signal word).Further information regarding these primitive signal words may beobtained from the aforementioned Kembel text. Generally, there is aminimum of six primitive signal words between the end of one FiberChannel data frame 1502 and the start of the next Fiber Channel dataframe. These primitive signal words are shown in FIG. 15 as primitivesignal words 1504. A start-of-frame (SOF) delimiter 1510, which istypically negative in polarity, is 40 bits long and defines the start ofFiber Channel data frame 1502. There are six 40-bit words defining frameheader 1512 adjacent to start-of-frame delimiter 1510. Following frameheader 1512, there may be up to 528 of 4-byte words of payload (or up to2,112 bytes of payload). This is shown as data payload 1514 in FIG. 15.The payload may also include optional header data, which reduces theactual payload capacity. Additional information regarding the FiberChannel protocol may be obtained from the Kembel reference. Followingdata payload 1514, there is one 40-bit CRC (Cyclic Redundancy Check) tobe followed by an end-of-frame delimiter, which is also 40 bits long.These are shown as CRC 1520 and end-of-frame (EOF) delimiter 1522respectively in FIG. 15. With respect to polarity, as is well known tothose familiar with the Fiber Channel specification, each 40-bit word inFiber Channel data frame 1502 may have a different polarity.

In FIG. 16, an Idle word 1600, representing a type of primitive signalword is shown. As mentioned earlier, each primitive signal word is 40bits long and organized into four 10-bit words. The first 10-bit word ofprimitive signal word 1504 bears a special code K28.5 (shown byreference 1602 in FIG. 16). The Fiber Channel specification requiresthat all 10 bits of the K28.5 word be located within a single 40-bitword. To put it differently, the 10-bit K28.5 word cannot be split amongadjacent 40-bit words. Following the K28.5 10-bit word, there are threeother 10-bit words shown in FIG. 16 by reference numbers 1604, 1606, and1608 respectively. As there are different primitive signal words, thecontent of the three 10-bit words that follow the K28.5 10-bit word mayvary. Furthermore, start-of-frame delimiter 1510 and end-of-framedelimiter 1522 also start with a K28.5 10-bit word. As in the case ofprimitive signal words, the next three 10-bit words of a start-of-framedelimiter 1510 or an end-of-frame delimiter 1522 may vary in content asthere are different start-of-frame delimiters 1510 and end-of-framedelimiters 1522 specified for each class.

FIG. 17 shows, in accordance with one embodiment of the presentinvention, a transmit interface circuit 1702 in greater detail. Asdiscussed in connection with FIG. 14, the incoming serial data stream isconverted by the transmit side SERDES (1420 in FIG. 14) to 10-bit wordsand received at bus 1424. Generally speaking, bus 1424 is a parallelbus, but it may also be a high-speed serial bus, for example. If bus1424 is a 10-bit parallel bus, bus 1424 typically operates at betweenaround 100 MHz to around 125 MHz to yield roughly one GHz or slightlyabove. In the case of Fiber Channel data, bus 1424, as a 10-bit parallelbus, may run at roughly 106 MHz. In the case of gigabit Ethernet data(which is not the case in FIG. 17), bus 1424 may run at, for example,125 MHz.

A FIFO 1710 converts the 10-bit data on bus 1424 into 40-bit data.Besides performing framing of the incoming data from 10 bits to 40 bits,FIFO 1710 also acts as a shock absorber to absorb data bursts coming invia bus 1424. Framing the incoming data as 40-bit words allows transmitinterface circuit 1702 to operate on a longer word, thereby enablingtransmit interface circuit 1702 to operate at a lower clock speed whilestill maintaining a high throughput. Framing the incoming data as 40-bitwords also makes it simpler to perform frame alignment in framealignment circuit 1712.

Frame alignment circuit 1712 looks for the 10-bit K28.5 word within each40-bit word. If it finds the 10-bit K28.5 word, that 10-bit K28.5 wordand the next three 10-bit words are considered, as a 40-bit word unit,to be either an FC fill 40-bit word (1504 in FIG. 15), a start-of-framedelimiter (1510 in FIG. 15), or end-of-frame delimiter (1522 in FIG.15). Using the start of the 10-bit K28.5 word to frame the 40-bit wordsreceived into transmit interface circuit 1702 accomplishes framealignment by ensuring that the beginning of the start-of-frame delimiter1510 can be accurately framed, or aligned, with respect to a reference40-bit word. Consequently, the frame header 1512, as well as payload1514 can also be properly framed with respect to reference 40-bit wordsand analyzed.

After frame alignment circuit 1712 frames the incoming data stream, the40-bit words are passed to traffic controller circuit 1714 for furtherprocessing. Traffic controller circuit 1714 receives the 40-bit wordsfrom frame alignment circuit 1712, and ascertains whether a received40-bit word is a primitive signal word, a start-of-frame delimiter, oneof the frame header 40-bit words, a 40-bit CRC word, or a 40-bitend-of-frame delimiter, or part of the data payload. Since the primitivesignal words and the start-of-frame delimiter are aligned with 40-bitreference words by frame alignment circuit 1712, the parsing of a FiberChannel data frame into its constituent parts can be achieved with theknowledge of the relative positions of each 40-bit word in the FiberChannel data frame, both relative to one another and relative to thestart-of-frame delimiter and the end-of-frame delimiter (as discussed inFIG. 15).

FIG. 18 illustrates, in accordance with one embodiment of the presentinvention, a flowchart showing how traffic controller circuit 1714 mayprocess each 40-bit word received from frame alignment circuit 1712. Aseach 40-bit word is received from frame alignment circuit 1712, trafficcontroller circuit 1714 first checks to see whether the first 10-bit ofthat 40-bit word is a 10-bit K28.5 word. This is shown in block 1802 ofFIG. 18. If the first 10 bits of the incoming 40-bit word from framealignment circuit 1712 is not a 10-bit K28.5 word, that 40-bit word mustbe either one of the frame header 40-bit words (1512 in FIG. 15), partof the data payload (1514 in FIG. 15), or a 40-bit CRC word (1520 inFIG. 15).

In this case, the 40-bit word is passed to an optimizable portion parser(block 1804 of FIG. 18), which ascertains whether the 40-bit wordreceived is part of the optimizable portion of the Fiber Channel dataframe, or part of the non-optimizable portion of the Fiber Channel dataframe. In one preferred embodiment, only the data payload (1514 of FIG.15) is optimizable, i.e., eligible to be processed further via eithercompression and/or encryption by optimization processor 1410. In anotherembodiment, even a whole or a portion of the frame header (1512 of FIG.15), and/or the CRC 40-bit word (1520 of FIG. 15) may also be eligibleto be optimized further via compression or encryption by optimizationprocessor 1410. Typically, however, when only the payload is optimized,the CRC is recalculated by transmit interface circuit 1702 for eachFiber Channel data frame that has been optimized and thus the CRC doesnot need to be optimized Irrespective of the specific implementation ofthe optimizable portion parser, the 40-bit word deemed to be part of thenon-optimizable portion is allowed to bypass directly to output oftransmit interface circuit 1702 while the optimizable portion is furtherprocessed.

In one embodiment, the header and/or payload is further analyzed todetermine if the Fiber Channel data frame should not be optimized (insome cases, one or more fields in the header may indicate that thisparticular Fiber Channel data frame should not be optimized). In thiscase, even the optimizable portion (i.e., the portion eligible to becompressed and/or encrypted by optimization processor 1410) would alsobe bypassed directly to the output of transmit interface circuit 1702via bus 1722, thereby, allowing the payload, header, and/or CRC portionsof the Fiber Channel data frame to transparently pass through transmitinterface circuit 1702 without modification or significant processing.If the header and/or payload do not indicate that the Fiber Channel dataframe under consideration should not be optimized, the optimizableportion is then passed on to optimization front-end circuit 1720 (shownin FIG. 17) for further processing. On the other hand, if the first10-bit of the 40-bit word received from frame alignment circuit 1712 isindeed a 10-bit K28.5 word, this 40-bit word is either a primitivesignal word, a start-of-frame delimiter, or an end-of-frame delimiter.If the received 40-bit word is a primitive signal word (as ascertainedin block 1810 of FIG. 18), the primitive signal word is bypasseddirectly to the output of transmit interface circuit 1702 via bypass bus1722.

In one embodiment, traffic controller circuit 1714 monitors a thresholdlevel at output FIFO 1724 (see FIG. 17) and outputs additional Idlewords (or one of the fill words) to output FIFO 1724 to essentiallycause output FIFO 1724 to output Idle words from transmit interfacecircuit 1702. In one embodiment, two fill words are output whenever thethreshold is below a certain level. This is useful since the FiberChannel protocol expects there to be protocol-acceptable data on thecommunication channel at all times. If optimization processor 1410 isbusy optimizing a particularly long Fiber Channel data frame, trafficcontroller circuit 1714 fills the communication channel withprotocol-acceptable data instead of allowing gibberish data to appear onthe communication channel. In one embodiment, the Idle words may comefrom the output FIFO 1724 itself (as opposed to from the trafficcontroller circuit). The threshold within output FIFO 1724 that triggersthe output of additional Idle words may be set via software duringconfiguration or execution, or may be adaptively changed based ontraffic pattern and bandwidth usage pattern of the incoming data stream.

On the other hand, if it is ascertained in block 1810 that the incoming40-bit word starts with a 10-bit K28.5 word but that 40-bit word is nota primitive signal word, a further decision point is made in block 1812,which ascertains whether the incoming 40-bit word is the start-of-framedelimiter or the end-of-frame delimiter. If the incoming 40-bit word isascertained in block 1812 to be a start-of-frame delimiter, thisstart-of-frame delimiter is immediately bypassed to the output oftransmit interface circuit 1702 via bypass bus 1722. On the other hand,if it is ascertained in block 1812 that the incoming 40-bit word is anend-of-frame delimiter, the end-of-frame delimiter is held by trafficcontroller circuit 1714 until traffic controller circuit 1714 receives asignal from an end-of-optimized-data-flag-handler circuit 1740 (see FIG.17) that indicates that traffic controller circuit 1714 can release apolarity-correct version of the end-of-frame delimiter to the output oftransmit interface circuit 1702. This is shown in blocks 1816, 1818, and1820 of FIG. 18 respectively. Furthermore, the end-of-frame delimiter isalso bypassed to the output of transmit interface circuit 1702 if itturns out that the optimizable portion belongs to a Fiber Channel dataframe that has been marked as one that should not be optimized (e.g., asascertained by examining a relevant field in the header or by analysisof the payload data). This is because such a Fiber Channel data framewill not be optimized and there is no need to hold on to theend-of-frame delimiter waiting for the optimization processor to finishoptimizing the optimizable portion because there is in fact nooptimization to be done.

As mentioned earlier, in connection with block 1804 of FIG. 18, theoptimizable portion of a Fiber Channel data frame that can be optimizedis passed on to an optimization front-end circuit 1720 (see FIG. 17) forfurther processing prior to actually being optimized by optimizationprocessor 1410. Referring back to FIG. 17 now, in optimization front-endcircuit 1720, the 40-bit words are de-framed into 10-bit words by a busframing circuit 1742. In one embodiment, bus framing circuit 1742 isimplemented by four 10-bit multiplexers that are selected by a counter.Thus, 40 bits of data are received in parallel and are separated intogroups of four 10-bit words, and the counter selects the 10-bit words ina round-robin fashion.

These 10-bit words are input into a protocol conversion circuit 1744,which converts the optimizable portion into a format acceptable foroptimization by optimization processor 1410. In one embodiment, the10-bit words received from bus framing circuit 1742 are converted to8-bit words using a 10-bit/8-bit look-up table.

The use of a look-up table to convert 10-bit data to 8-bit data is wellknown in the art.

One implementation of such a 10-bit/8-bit lookup table may be found at,e.g., the aforementioned Kembel text.

The data to be optimized, now converted to 8-bit in the example of FIG.17, is input into an end-of-optimization-file-processing circuit 1746,which tags or marks the last word of the optimizable portion of theFiber Channel data frame with a flag to indicate to optimizationprocessor 1410 that the 8-bit word so flagged represents the last 8-bitword of the file to be optimized for the current Fiber Channel dataframe. In one embodiment, an extra bit is added to each 8-bit wordreceived from protocol conversion circuit 1744. Consequently, 9-bitwords are sent to optimization processor 1410 with one bit of each 9-bitword representing the end-of-optimization-file flag.

The last 9-bit word of the optimization file would have itsend-of-optimization-file 1-bit flag set. When optimization processor1410 receives these 9-bit words, a circuit in the optimization processor1410 (e.g., an input FIFO within optimization processor 1410) performsthe task of detecting the end of the optimization file, and strips awaythe additional flag bit after detection to allow the optimization corewithin optimization processor 1410 to operate only on the 8-bit words.In other words, the extra 1-bit is added to flag the end of theoptimization file between transmit interface circuit 1408 andoptimization processor 1410, and is stripped away before the optimizableportion of the Fiber Channel data frame is optimized (compressed and/orencrypted) by optimization processor 1410. In this manner, substantiallyno overhead is incurred by the optimization core (i.e., the actualcompression/decompression engine or the encryption/decryption engine)within the optimization processor by this universal and flexible (i.e.,easily adaptable to different incoming protocols) in-band signalingtechnique for communicating the end-of-optimized-file informationbetween the transmit interface circuit and the optimization processor.

In another embodiment, transmit interface circuit 1408 may flag the endof the optimization file by other means, such as by a dedicated signal(out of band signaling vs. in band signaling). In this case, the datamay be sent, using the above example, as 8-bit data. In any case, theoptimizable portion of the Fiber Channel data frame is then optimized byoptimization processor 1410, and sent back to transmit interface circuit1702 as 8-bit words via a bus 1430. Optimization processor 1410 alsogenerates a unique end-of-optimized-data flag in the optimized data sentback to transmit interface circuit 1702 via bus 1430. As discussedearlier, this end-of-optimized-data flag is detected by anend-of-optimized-data-flag-handler circuit 1740.

The optimized data is then converted back to 10-bit via protocolconversion circuit 1760, which, in the case of FIG. 17, is aconventional 8-bit/10-bit table look-up. Thus, 10-bit words are sentfrom protocol conversion circuit 1760 to a bus framing circuit 1762 (viaa bus 1768) to frame four 10-bit words into one 40-bit word for outputto a multiplexer 1764. In one embodiment, bus framing circuit 1762 isimplemented using four shift registers and a counter that shifts, in around-robin fashion, the first, second, third, and fourth 10-bit wordsinto a 40-bit word, and outputs the 40-bit word to a multiplexer 1764.For the last 40-bit word of the optimized data, bus-framing circuit 1762also pads the data so that a full 40-bit word is sent to multiplexer1764.

Thus, as each 40-bit word is received from frame alignment circuit 1712,traffic controller circuit 1714 ascertains whether the 40-bit wordreceived is a primitive signal word, a start-of-frame delimiter, anend-of-frame delimiter. If a primitive signal word is detected, it isimmediately bypassed via bypass bus 1722 and multiplexer 1764 to outputFIFO 1724. Multiplexer 1764 merely selects, based on whether data isbypassed via bypass bus 1722 or sent through bus framing circuit 1762,whether output FIFO 1724 will receive data from the bypass bus 1722 orfrom bus framing circuit 1762. If a start-of-frame delimiter isdetected, traffic controller circuit 1714 immediately bypasses thestart-of-frame delimiter to output FIFO 1724 via bypass bus 1722 andmultiplexer 1764. The start-of-frame delimiter then waits in output FIFO1724 to be assembled with the optimized data sent back by optimizationprocessor 1410. The non-optimizable portion of the Fiber Channel frameis also bypassed directly to output FIFO 1724 (see FIG. 18) via bypassbus 1722 and multiplexer 1764.

If the 40-bit word is neither a primitive signal word nor one of thestart-of-frame delimiters and end-of-frame delimiters, traffic circuit1714 sorts the incoming 40-bit word as either an optimizable portion ora non-optimizable portion (as discussed in FIG. 18). The optimizableportion is then processed by optimization front-end circuit 1720 andoptimization processor 1410, and received as optimized data to beassembled with the waiting start-of-frame delimiter and any bypassednon-optimizable portion (such as the header).

After the end-of-optimized-data flag is detected in the optimized datastream coming from optimization processor 1410 byend-of-optimized-data-flag-handler circuit 1740, a new CRC may becalculated and assembled with the optimized data in output FIFO 1724.The detection of the end-of-optimized-data-flag-handler circuit 1740also permits traffic controller circuit 1714 to release apolarity-correct version of the end-of-frame delimiter it stored earlierfor the current Fiber Channel data frame. This end-of-frame delimiter isbypassed via bypass bus 1722 and multiplexer 1764 to be assembled withthe waiting but incomplete Fiber Channel data frame in output FIFO 1724.

As mentioned earlier, transmit interface circuit 1702 also performscongestion control to ensure that optimization processor 1410 is notoverloaded when data arrives at data optimization engine 1402 in rapidbursts. In one embodiment, when traffic controller circuit 1714 detectsan end-of-frame delimiter, it waits until after processing of thecurrent Fiber Channel data frame is finished before it receives the nextFiber Channel data frame for processing. For example, it may wait untilit receives a signal from end-of-optimized-data-flag-handler circuit1740, indicating that optimization processor 1410 has finishedprocessing the current optimizable portion of the current Fiber Channeldata frame before it receives additional data from frame alignmentcircuit 1712. In the meantime, FIFO 1710 may act as a shock absorber toabsorb the data bursts while waiting for optimization processor 1410 tofinish its current processing.

In one embodiment, the transmit interface circuit 1408 also marks theheader of the optimized Fiber Channel data frame so that that FiberChannel data frame may be recognized in the future as one that containsoptimized data. This marking helps another data optimization engine toascertain whether a Fiber Channel data frame has been optimized earlierby a data optimization engine.

FIG. 19 illustrates, in accordance with one embodiment of the presentinvention, how end-of-optimized-data-flag-handler circuit 1740 handlesoptimized data received from the optimization processor 1410 and detectsan end-of-optimized-data flag in the stream of optimized data received.In FIG. 19, optimized data is received from optimization processor 1410via a bus 1430 (shown in both FIGS. 17 and 19). Since the word size ofthe optimized data words received from optimization processor 1410 maydiffer from the actual size of the codes output by the compressor and/orencryption engine, a strategy needs to be developed to ensure that theend-of-optimized-data flag can be reliably detected.

In one embodiment, the optimization processor 1410 implements theaforementioned high-speed optimized compression algorithm, and yields 11bits of code for the incoming 8-bit words into the optimizationprocessor. The use of 11 bits is advantageous since it allows the use ofa dictionary that can compress the entire Fiber Channel payload (2112bytes maximum) without a significant possibility of overflowing. In thiscase, although the optimized data received from bus 1430 are words thatare 8-bit long each (which is the size of data words expected by thetransmit interface circuit), the data is packed into 8-bit words andsent in frames of 88 bits (8×11).

In the present example, 11-bit of code is generated for the incoming8-bit data words by optimization processor 1410 implementing an adaptivecompression scheme (such as LZW or the aforementioned inventive HSO). Inblock 1902, it is ascertained whether the last 11 bits of the 88-bitframe of optimized data received from optimization processor 1410contains the hex value 7FF. This is because in this example, the hexvalue 7FF is chosen as the special end-of-optimized-data flag to allowoptimization processor 1410 to flag to transmit interface circuit 1702that this particular data frame contains the last of the optimized data.If the optimized data does not fill up the 88-bit frame, the remainderof the 88-bit frame may be padded with 1's to make sure that the last 11bits would contain the hex value 7FF. However,end-of-optimized-data-flag-handler circuit 1740 may simply look, in oneembodiment, for this specific pattern (or another unique patterndesignated to represent the end of the optimized data flag) anywherewithin the 88-bit frame.

In one embodiment, the unique 11-bit code 7FF that represents the EOFmay straddle a maximum of 3 consecutive bytes. In this case, monitoringfor 3 consecutive 7FF bytes will ensure that EOF will be detected in thedata stream. In another aspect of the present invention, padding isperformed after the 3 consecutive 7FF bytes until the frame reaches 32bits, which is the word size (for 8-bit encoding) for the Fiber Channelpayload. If another protocol is employed, padding is performed on thelast frame to add to the byte that contains the EOF until the last framereaches a size that would be outputted from the data optimizationengine.

If the end-of-optimized-data flag is detected in the 88-bit optimizeddata frame, the end-of-optimized-data-flag-handler circuit 1740 signals(in block 1904) to traffic controller circuit 1714 to bypass theend-of-frame delimiter with the correct polarity that it stored earlierto output FIFO 1724. In this manner, a universal and flexible (i.e.,easily adaptable to different incoming protocols) in-band signalingtechnique for communicating the end-of-optimized-data informationbetween the optimization processor and the transmit interface circuit isaccomplished.

With respect to the polarity of the end-of-frame delimiter, in oneembodiment, when traffic controller circuit 1714 detects an end-of-framedelimiter in the incoming data stream, it stores both the CRD+(CurrentRunning Disparity) and CRD−versions of the end-of-frame delimiterdetected for an optimizable Fiber Channel data frame. Whenend-of-optimized-data-flag-handler circuit 1740 signals that theend-of-frame delimiter, with the correct polarity, should be bypassed tooutput FIFO 1724, traffic controller circuit 1714 consults protocolconversion circuit 1760 to determine whether the positive or thenegative polarity version should be sent onward to output FIFO 1724.This decision is based on the polarity of the last word of optimizeddata converted by protocol conversion circuit 1760. In any case, theoptimized data received from bus 1430 is passed on to protocolconversion circuit 1760 (in step 1906) to be converted to 10-bit data.Note that this unique code signifying the end of the optimized dataremains embedded in the optimized data stream after protocol conversion,and is detectable by the received interface circuit (1412 of FIG. 14)when it comes time to “de-optimize” the data.

As is well known, different words in the Fiber Channel data frame mayhave different polarities as specified by the Fiber Channelspecification. FIG. 20 illustrates, in accordance with one embodiment,how protocol conversion circuit 1760 may perform the protocol conversionsuch that output words having the correct polarities may be output tobus framing circuit 1760 for eventual output to output FIFO 1724 (seeFIG. 17). In FIG. 20, the protocol conversion from 8-bit words to 10-bitwords is performed using an 8-bit/10-bit table look-up. However, the8-bit/10-bit table is pre-processed to generate two different tables: ACRD+table and a CRD−table. The CRD+table includes entries from 8-bitwords to CRD+10-bit words. The CRD-table has entries for translating8-bit words into the CRD-10-bit words.

Furthermore, there is a neutral flag in the form of an extra bit in eachentry. This extra bit may be appended or pre-pended to the 10-bit code,or may be a separate column altogether. For each 10-bit word in thetable entry (either CRD+ or CRD-entry), if the number of 0's and 1's areequal in the 10-bit code, the neutral flag is set. The use of twopolarity tables and a neutral flag allows the protocol conversioncircuit to rapidly generate the polarity-correct 10-bit words foroutput.

In the flowchart of FIG. 20, each input 8-bit code into protocolconversion circuit 1760 (block 2010) is ascertained in block 2012 todetermine whether the previous 10-bit code output is positive inpolarity, negative in polarity, or neutral (i.e., the number of 0's and1's are equal in the 10-bit code and flagged as being neutral). If theprevious 10-bit code is output from the CRD+table and the neutral flagof the previous 10-bit code is not set, then the previous 10-bit code isdeemed to be positive for the purposes of block 2012. On the other hand,if the previous 10-bit code is output from the CRD−table, and theneutral flag of the previous 10-bit code is not set, then the previous10-bit code is deemed to be negative in polarity for the purposes ofblock 2012. If the previous 10-bit code is output from either the CRDPlus or the CRD Minus table, but the neutral flag is set, then theprevious 10-bit code is deemed to be neutral for the purposes of block2012.

In the case of a previously negative 10-bit code, the next 10-bit codeto be output comes from the CRD+table, as seen in block 2014. In theCRD+table, the 10-bit code is then obtained (or 11-bit code if the oneneutral flag bit is directly appended or pre-pended to the 10-bit code).This is shown in block 2016. In block 2018, the flag bit is removed, andthe 10-bit code is output (in block 2024) to the bus framing circuit1762 (see FIG. 17). In the case where the previous 10-bit code ispositive in polarity, the next 10-bit code is obtained from theCRD−table (as shown in block 2020). Thereafter, the 10-bit code isobtained and forwarded to bus framing circuit 1762. If the previous10-bit code is neutral, the next 10-bit code is obtained from the tablethat was used to obtain the previous 10-bit code. This is shown in block2022. In so doing, the 11-bit code is obtained (2016), the flag bit isstripped (2018) and the 10-bit code is passed (2024) onto bus framingcircuit 1762.

Note that the polarity of the last 10-bit code of the optimized datastream is also employed to determine the polarity of the end-of-framedelimiter to be bypassed by traffic controller circuit 1714 of FIG. 17to output FIFO 1724 to complete the Fiber Channel data frameencapsulating the optimized data to be output onto the media. If thepolarity of the last 10-bit code of the optimized data stream ispositive, a negative end-of-frame delimiter is sent. Conversely, if thepolarity of the last 10-bit code of the optimized data stream isnegative, a positive end-of-frame delimiter is sent.

FIG. 21 shows, in accordance with one embodiment of the presentinvention, a receive interface circuit 2102 in greater detail. Thereceive interface circuit reverses the process performed by the transmitinterface circuit, with some important differences as discussed herein.The incoming serial data stream is first converted by the receive sideSERDES (1462 in FIG. 14) to 10-bit words and received at bus 1464.Generally speaking, bus 1464 is a parallel bus, but it may also be ahigh-speed serial bus, for example. If bus 1464 is a 10-bit parallelbus, bus 1464 typically operates at between around 100 MHz to around 125MHz to yield roughly one GHz or slightly above. In the case of FiberChannel data, bus 1464, as a 10-bit parallel bus, may run at roughly 106MHz. In the case of gigabit Ethernet data (which is not the case in FIG.21), bus 1464 may run at, for example, 125 MHz.

A FIFO 2110 converts the 10-bit data on bus 1464 into 40-bit data.Besides performing framing of the incoming data from 10 bits to 40 bits,FIFO 2110 also acts as a shock absorber to absorb data bursts coming invia bus 1464. Framing the incoming data as 40-bit words allows transmitinterface circuit 2102 to operate on a longer word, thereby enablingtransmit interface circuit 2102 to operate at a lower clock speed whilestill maintaining a high throughput. Framing the incoming data as 40-bitwords also makes it simpler to perform frame alignment in framealignment circuit 2112.

Frame alignment circuit 2112 looks for the 10-bit K28.5 word within each40-bit word. If it finds the 10-bit K28.5 word, that 10-bit K28.5 wordand the next three 10-bit words are considered, as a 40-bit word unit,to be either a FC fill 40-bit word (1504 in FIG. 15), a start-of-framedelimiter (1510 in FIG. 15), or end-of-frame delimiter (1522 in FIG.15). Using the start of the 10-bit K28.5 word to frame the 40-bit wordsreceived into receive interface circuit 2102 accomplishes framealignment by ensuring that the beginning of the start-of-frame delimiter1510 can be accurately framed, or aligned, with respect to a reference40-bit word. Consequently, the frame header 1512, as well as payload1514 can also be properly framed with respect to reference 40-bit wordsand analyzed.

After frame alignment circuit 2112 frames the incoming data stream, the40-bit words are passed to traffic controller circuit 2114 for furtherprocessing. Traffic controller circuit 2114 receives the 40-bit wordsfrom frame alignment circuit 2112, and ascertains whether a received40-bit word is a primitive signal word, a start-of-frame delimiter, oneof the frame header 40-bit words, a 40-bit CRC word, or a 40-bitend-of-frame delimiter, or part of the data payload. Since the primitivesignal words and the start-of-frame delimiter are aligned with 40-bitreference words by frame alignment circuit 2112, the parsing of a FiberChannel data frame into its constituent parts can be achieved with theknowledge of the relative positions of each 40-bit word in the FiberChannel data frame, i.e., relative to one another and/or relative to thestart-of-frame delimiter and/or the end-of-frame delimiter (as discussedin FIG. 15).

In one embodiment, the traffic controller circuit 2114 may check anappropriate flag in one of the fields in the frame header, which flag isset by the transmit interface circuit or the optimization circuitryearlier, to see if this Fiber Channel data frame had been optimizedbefore. If decryption is involved, the traffic controller may also,alternatively or additionally, check for the presence of an encryptionkey (assuming pubic key encryption was involved) to determine if thisFiber Channel data frame had been optimized before. If it had not beenoptimized before, the entire Fiber Channel data frame, up to theend-of-frame delimiter may be immediately bypassed to the output ofreceive interface circuit 2102 via bypass bus 2122, thereby renderingthe data optimization engine substantially transparent with respect tothe Fiber Channel data frames previously not optimized.

In another embodiment, as each 40-bit word is received from framealignment circuit 2112, traffic controller circuit 2114 first checks tosee whether the first 10-bit of that 40-bit word is a 10-bit K28.5 word.If the first 10 bits of the incoming 40-bit word from frame alignmentcircuit 2112 is not a 10-bit K28.5 word, that 40-bit word must be eitherone of the frame header 40-bit words (1512 in FIG. 15), part of the datapayload (1514 in FIG. 15), or a 40-bit CRC word (1520 in FIG. 15).

In this case, the 40-bit word is passed to a de-optimizable portionparser, which ascertains whether the 40-bit word received is part of thede-optimizable portion of the Fiber Channel data frame, or part of thenon-de-optimizable portion of the Fiber Channel data frame. In onepreferred embodiment, only the data payload (1514 of FIG. 15) isde-optimizable, i.e., eligible to be processed further via eitherdecompression and/or decryption by optimization processor 1410. Inanother embodiment, even a whole or a portion of the frame header (1512of FIG. 15), and/or the CRC 40-bit word (1520 of FIG. 15) may also beeligible to be de-optimized further via decompression or decryption byoptimization processor 1410. Typically, however, when only the payloadis de-optimized, the CRC is recalculated by receive interface circuit2102 for each Fiber Channel data frame that has been de-optimized andthus the CRC does not need to be de-optimized Irrespective of thespecific implementation of the de-optimizable portion parser, the 40-bitword deemed to be part of the non-de-optimizable portion is allowed tobypass directly to output of receive interface circuit 2102 while thede-optimizable portion is further processed.

The header and/or payload is further analyzed to determine if the FiberChannel data frame should not be de-optimized (in some cases, one ormore fields in the header may indicate that this particular FiberChannel data frame should not be de-optimized). In this case, even thede-optimizable portion (i.e., the portion eligible to be decompressedand/or decrypted by optimization processor 1410) would also be bypasseddirectly to the output of transmit interface circuit 2102 via bus 2122,thereby, allowing the payload, header, and/or CRC portions of the FiberChannel data frame to transparently pass through transmit interfacecircuit 2102 without modification or significant processing.

On the other hand, if it is ascertained that the de-optimizable portionshould be de-optimized (due to a detection that the Fiber Channel dataframe was optimized earlier or due to the presence of a public key), thede-optimizable portion is then passed on to optimization front-endcircuit 2120 (shown in FIG. 21) for further processing.

If the first 10-bit of the 40-bit word received from frame alignmentcircuit 2112 is indeed a 10-bit K28.5 word, this 40-bit word is either aprimitive signal word, a start-of-frame delimiter, or an end-of-framedelimiter. If the received 40-bit word is a primitive signal word (asascertained in block 1810 of FIG. 18), the primitive signal word isbypassed directly to the output of transmit interface circuit 2102 viabypass bus 2122.

In one embodiment, traffic controller circuit 2114 monitors a thresholdlevel at output FIFO 2124 (see FIG. 21) and outputs additional Idlewords (or one of the fill words) to output FIFO 2124 to essentiallycause output FIFO 2124 to output Idle words from transmit interfacecircuit 2102. In one embodiment, two fill words are output whenever thethreshold is below a certain level. This is useful since the FiberChannel protocol expects there to be protocol-acceptable data on thecommunication channel at all times. If optimization processor 1410 isbusy de-optimizing a particularly long Fiber Channel data frame, trafficcontroller circuit 2114 fills the communication channel withprotocol-acceptable data instead of allowing gibberish data to appear onthe communication channel. In one embodiment, the Idle words may comefrom the output FIFO 2124 itself (as opposed to from the trafficcontroller circuit). The threshold within output FIFO 2124 that triggersthe output of additional Idle words may be set via software duringconfiguration or execution, or may be adaptively changed based ontraffic pattern and bandwidth usage pattern of the incoming data stream.

On the other hand, if it is ascertained that the incoming 40-bit wordstarts with a 10-bit K28.5 word but that 40-bit word is not a primitivesignal word, a further decision point is made, which ascertains whetherthe incoming 40-bit word is the start-of-frame delimiter or theend-of-frame delimiter. If the incoming 40-bit word is ascertained to bea start-of-frame delimiter, this start-of-frame delimiter is immediatelybypassed to the output of transmit interface circuit 2102 via bypass bus2122. On the other hand, if it is ascertained in block 1812 that theincoming 40-bit word is an end-of-frame delimiter, the end-of-framedelimiter is held by traffic controller circuit 2114 until trafficcontroller circuit 2114 receives a signal from anend-of-deoptimized-data-flag-handler circuit 2140 (see FIG. 21) thatindicates that traffic controller circuit 2114 can release apolarity-correct version of the end-of-frame delimiter to the output ofreceive interface circuit 2102. A technique for selecting thepolarity-correct end-of-frame delimiter based on the polarity of thewords previously examined has been discussed in connection with transmitinterface circuit of FIG. 17.

Furthermore, the end-of-frame delimiter is also bypassed to the outputof transmit interface circuit 2102 if it turns out that thede-optimizable portion belongs to a Fiber Channel data frame should notbe de-optimized (e.g., as ascertained by examining a relevant field inthe header or by analysis of the payload data). This is because such aFiber Channel data frame will not be de-optimized and there is no needto hold on to the end-of-frame delimiter waiting for the optimizationprocessor to finish de-optimizing the de-optimizable portion becausethere is in fact no de-optimization to be done.

The de-optimizable portion of a Fiber Channel data frame that can bede-optimized is passed on to a de-optimization front-end circuit 2120(see FIG. 21) for further processing prior to actually beingde-optimized by optimization processor 1410. Referring back to FIG. 21now, in de-optimization front-end circuit 2120, the 40-bit words arede-framed into 10-bit words by a bus framing circuit 2142. In oneembodiment, bus framing circuit 2142 is implemented by four 10-bitmultiplexers that are selected by a counter. Thus, 40-bits of data arereceived in parallel and are separated into groups of four 10-bit words,and the counter selects the 10-bit words in a round-robin fashion.

These 10-bit words are input into a protocol conversion circuit 2144,which converts the de-optimizable portion into a format acceptable forde-optimization by optimization processor 1410. In one embodiment, the10-bit words received from bus framing circuit 2142 are converted to8-bit words using a 10-bit/8-bit look-up table. The use of a look-uptable to convert 10-bit data to 8-bit data is well known in the art.Information regarding 8b/10b encoding and decoding may be obtained, forexample from the aforementioned Kembel text.

The de-optimizable portion of the Fiber Channel data frame is thende-optimized (decompressed and/or decrypted) by optimization processor1410, and sent back to transmit interface circuit 2102 as 8-bit wordsvia a bus 1430. Optimization processor 1410 can ascertain the end of thede-optimized data file by detecting the end-of-optimization-data flagpreviously provided with the de-optimizable portion during theoptimization process. This end-of-optimized-data flag is also detectedby an end-of-de-optimized-data-flag (EODD) handler circuit 2140.

The de-optimized data is then converted back to 10-bit via protocolconversion circuit 2160, which, in the case of FIG. 21, is aconventional 8-bit/10-bit table look-up. Thus, 10-bit words are sentfrom protocol conversion circuit 2160 to a bus framing circuit 2162 (viaa bus 2168) to frame four 10-bit words into one 40-bit word for outputto a multiplexer 2164. Multiplexer 2164 merely selects, based on whetherdata is bypassed via bypass bus 2122 or sent through bus framing circuit2162, whether output FIFO 2124 will receive data from the bypass bus2122 or from bus framing circuit 2162. In one embodiment, bus framingcircuit 2162 is implemented using four shift registers and a counterthat shifts, in a round-robin fashion, the first, second, third, andfourth 10-bit words into a 40-bit word, and outputs the 40-bit word to amultiplexer 2164. For the last 40-bit word of the de-optimized data,bus-framing circuit 2162 also pads the data so that a full 40-bit wordis sent to multiplexer 2164.

After the end-of-de-optimized-data flag is detected in the de-optimizeddata stream coming from optimization processor 1410 byend-of-optimized-data-flag-handler circuit 2140, a new CRC may becalculated and assembled with the de-optimized data in output FIFO 2124.The detection of the end-of-de-optimized-data-flag-handler circuit 2140also permits traffic controller circuit 2114 to release apolarity-correct version of the end-of-frame delimiter it stored earlierfor the current Fiber Channel data frame. This end-of-frame delimiter isbypassed via bypass bus 2122 and multiplexer 2164 to be assembled withthe waiting but incomplete Fiber Channel data frame in output FIFO 2124.

As mentioned earlier, receive interface circuit 2102 also performscongestion control to ensure that optimization processor 1410 is notoverloaded when data arrives at data optimization engine 1402 in rapidbursts. In one embodiment, when traffic controller circuit 2114 detectsan end-of-frame delimiter, it waits until after processing of thecurrent Fiber Channel data frame is finished before it receives the nextFiber Channel data frame for processing. For example, it may wait untilit receives a signal from end-of-de-optimized-data-flag-handler circuit2140, indicating that optimization processor 1410 has finishedprocessing the current de-optimizable portion of the current FiberChannel data frame before it receives additional data from framealignment circuit 2112. In the meantime, FIFO 2110 may act as a shockabsorber to absorb the data bursts while waiting for optimizationprocessor 1410 to finish its current processing.

While this invention has been described in terms of several preferredembodiments, there are alterations, permutations, and equivalents whichfall within the scope of this invention. For example, although the FiberChannel protocol has been a preferred embodiment discussed in details,it should be understood that the modular architecture of the dataoptimization engine herein, its ability to work with differentprotocols, the HSO compression technique, and other innovativetechniques and arrangements described herein, may be readily applicableto any protocol, including packet-based protocols such as Ethernet,TCP/IP, etc. When packet-oriented protocols are involved, processing bythe data optimization engine is performed on a packet-by-packet basis.It should also be noted that there are many alternative ways ofimplementing the methods and apparatuses of the present invention. It istherefore intended that the following appended claims be interpreted asincluding all such alterations, permutations, and equivalents as fallwithin the true spirit and scope of the present invention.

1. A data optimization engine for optimizing selected frames of a firststream of data, comprising: a transmit interface circuit coupled to anoptimization processor, said transmit interface circuit being configuredfor receiving said first stream of data, said transmit interface circuitincludes a traffic controller circuit for separating frames in saidfirst stream of data into a first optimizable frame and a firstnon-optimizable frame, and an optimization front-end circuit coupled tosaid traffic controller circuit to receive at least a first portion ofsaid first optimizable frame, said optimization front-end circuitincluding a protocol conversion circuit configured to convert data insaid first portion of said first optimizable frame from a first protocolto a second protocol suitable for processing by said optimizationprocessor, said first protocol specifies a first word length, saidsecond protocol specifies a second word length different from said firstword length, said optimization front-end circuit further includes anend-of-optimization-file processing circuit, saidend-of-optimization-file processing circuit flagging an end of saidfirst portion of said first optimizable frame to said optimizationprocessor, wherein said optimization processor is configured to optimizesaid first portion of said first optimizable frame by performing atleast one of compression and encryption on said first portion of saidfirst optimizable frame.
 2. The data optimization engine of claim 1wherein said end-of-optimization-file flagging processing circuit isconfigured to add, after said data in said first portion of said firstoptimizable frame is converted from said first protocol to said secondprotocol, an end-of-optimization-file flag to each word sent from saidtransmit interface circuit to said optimization processor.
 3. The dataoptimization engine of claim 2 wherein said end-of-optimization-fileflag is one bit long.
 4. The data optimization engine of claim 1 whereinsaid first protocol is the 10-bit interface protocol, and said protocolconversion circuit includes a 10-bit/8-bit lookup table.
 5. The dataoptimization engine of claim 4 further including a frame alignmentcircuit for detecting and aligning a start of a primitive signal word insaid first stream of data with a start of a reference 40-bit word,thereby framing said primitive signal word with respect to saidreference 40-bit word.
 6. The data optimization engine of claim 5wherein said frame alignment circuit detects said start of saidprimitive signal word by monitoring for a K28.5 10-bit word in saidfirst stream of data.
 7. The data optimization engine of claim 4 furtherincluding an output FIFO coupled to said traffic controller circuit andsaid optimization front-end circuit, said traffic controller circuitfurther includes a start-of-frame handler circuit and an end-of-framehandler circuit, said start-of-frame handler circuit is configureddetect a start-of-frame 40-bit word in said first optimizable frame andto send said start-of-frame 40-bit word to said output FIFO, effectivelybypassing said optimization front-end circuit, said end-of-frame handlercircuit is configured to detect an end-of-frame 40-bit word in saidfirst optimizable frame and to temporarily retain said end-of-frame40-bit word while waiting for said optimization processor to completeoptimizing said first portion of said first optimizable frame, saidend-of-frame handler circuit is further configured to furnish apolarity-correct version of said end-of-frame 40-bit word to said outputFIFO for appending to first optimized data within said output FIFO, saidfirst optimized data represents a first optimized version of said firstportion of said first optimizable frame after being optimized by saidoptimization processor.
 8. The data optimization engine of claim 7wherein said transmit interface circuit further includes anend-of-optimized-data flag handler circuit coupled to receive secondoptimized data from said optimization processor, said second optimizeddata represents a second optimized version of said first portion of saidfirst optimizable frame after being optimized by said optimizationprocessor, said end-of-optimized data flag handler being configured todetect an end-of-optimized data flag in said second optimized data andsignals, upon detecting said end-of-optimized data flag in said secondoptimized data, said end-of-frame handler circuit to furnish saidpolarity-correct version of said end-of-frame 40-bit word to said outputFIFO.
 9. The data optimization engine of claim 1, wherein the dataoptimization engine is configured to be deployed in a Fiber Channelsetting.
 10. The data optimization engine of claim 1, wherein the dataoptimization engine is configured to be interposed between a FiberChannel controller and a serializer/deserializer.
 11. The dataoptimization engine of claim 1, wherein the data optimization engine isconfigured to work in conjunction with protocols selected from a groupof protocols comprising: Ethernet protocols, Extended Attachment UnitInterface (XAUI) protocols, or I-SCSI protocols.
 12. The dataoptimization engine of claim 1, wherein the data optimization engine isconfigured to be deployed between a host device and a storage device.13. The data optimization engine of claim 12, wherein the dataoptimization engine is configured to be deployed between the host deviceand multiple different types of interfaces.
 14. The data optimizationengine of claim 13, wherein said multiple different types of interfacescomprise interfaces selected from a group of interfaces comprising atleast: a Fiber Channel interface, an Ethernet interface, a SCSIinterface and an Infiniband interface.
 15. The data optimization engineof claim 1, wherein the data optimization engine is configured to bedeployed between a CPU and a memory.
 16. The data optimization engine ofclaim 1, wherein the data optimization engine is configured to bedeployed between networked devices.
 17. The data optimization engine ofclaim 16, wherein said networked devices comprise networked devicesselected from a group of networked devices comprising: a networkinterface card, a router, or a switch.
 18. The data optimization engineof claim 16, wherein a network associated with the networked devices isa network in which only routers and switches at an edge of the networkperform compression/decompression and/or encryption/decryption.
 19. Thedata optimization engine of claim 1, wherein the data optimizationengine is configured to be interposed between two PCI devices.
 20. Thedata optimization engine of claim 19, wherein the data optimizationengine is configured to: process memory write transactions between thetwo PCI devices for possible encryption and/or compression; and processmemory read transactions between the two PCI devices for possibledecryption and/or decompression.
 21. A method comprising: receiving astream of data; separating frames in said stream of data into a firstoptimizable frame and a first non-optimizable frame; converting data ina first portion of said first optimizable frame from a first protocol toa second protocol suitable for processing by an optimization processor,wherein said first protocol specifies a first word length and saidsecond protocol specifies a second word length different from said firstword length; adding an indication to an end of said first portion ofsaid first optimizable frame for the optimization processor; and inresponse to said adding, performing, with said optimization processor,at least one of compression or encryption on said first portion of saidfirst optimizable frame.
 22. The method of claim 21, wherein said addingsaid indication comprises adding a flag indicating anend-of-optimization to a last word of said first portion of said firstoptimizable frame.
 23. The method of claim 22, wherein said flag is onebit long.
 24. The method of claim 21 wherein said first protocol is a10-bit interface protocol, and said converting comprises using a10-bit/8-bit lookup table.
 25. The method of claim 24 further comprisingdetecting and aligning a start of a primitive signal word in said firststream of data with a start of a reference 40-bit word effective toframe said primitive signal word with respect to said reference 40-bitword.
 26. The method of claim 25 wherein said detecting is performed bymonitoring for a K28.5 10-bit word in said first stream of data.
 27. Themethod of claim 24 further comprising: detecting a start-of-frame 40-bitword in said first optimizable frame and sending said start-of-frame40-bit word to an output FIFO; detecting an end-of-frame 40-bit word insaid first optimizable frame and temporarily retaining said end-of-frame40-bit word while waiting for completion of optimizing said firstportion of said first optimizable frame; and furnishing apolarity-correct version of said end-of-frame 40-bit word to said outputFIFO for appending to first optimized data within said output FIFO, saidfirst optimized data representing a first optimized version of saidfirst portion of said first optimizable frame after being optimized bysaid optimization processor.
 28. The method of claim 27 furthercomprising: receiving second optimized data from said optimizationprocessor, said second optimized data representing a second optimizedversion of said first portion of said first optimizable frame afterbeing optimized by said optimization processor; detecting anend-of-optimized data flag in said second optimized data; and upondetecting said end-of-optimized data flag in said second optimized data,furnishing said polarity-correct version of said end-of-frame 40-bitword to said output FIFO.
 29. The method of claim 21, wherein each ofthe first and second protocols is one of an optical protocol, a wiredprotocol, or a wireless protocol, and wherein the first protocol isdifferent from the second protocol.
 30. The method of claim 21, whereineach of the first and second protocols is one of an Ethernet protocol, aTransmission Control Protocol (TCP), an Internet Protocol (IP), a TCP/IPprotocol, a Fiber Channel Protocol (FCP), an Extended Attachment UnitInterface (XAUI) protocol, a Small Computer System Interface (SCSI)protocol, a storage over Ethernet (I-SCSI) protocol, a PeripheralComponent Interconnect (PCI) protocol, an extended PCI (PCI-X) protocol,an Infiniband protocol, a High Speed Serial Interface (HSSI) protocol, a10-bit interface (TBI) protocol, an Advanced Technology Attachment (ATA)protocol, an Integrated Drive Electronics (IDE) protocol, or a 64/66protocol, and wherein the first protocol is different from the secondprotocol.
 31. A system comprising: means for receiving a stream of data;means for separating frames in said stream of data into a firstoptimizable frame and a first non-optimizable frame; means forconverting data in a first portion of said first optimizable frame froma first protocol to a second protocol suitable for processing by anoptimization processor, wherein said first protocol specifies a firstword length and said second protocol specifies a second word lengthdifferent from said first word length; means for adding an indication toan end of said first portion of said first optimizable frame for theoptimization processor; and means for performing, with said optimizationprocessor and in response to said adding, at least one of compression orencryption on said first portion of said first optimizable frame. 32.The system of claim 31, wherein said means for adding said indicationcomprises means for adding a flag indicating an end-of-optimization to alast word of said first portion of said first optimizable frame.
 33. Thesystem of claim 32, wherein said flag is one bit long.
 34. The system ofclaim 31 wherein said first protocol is a 10-bit interface protocol, andsaid means for converting comprises means for using a 10-bit/8-bitlookup table.
 35. The system of claim 34 further comprising means fordetecting and means for aligning a start of a primitive signal word insaid first stream of data with a start of a reference 40-bit wordeffective to frame said primitive signal word with respect to saidreference 40-bit word.
 36. The system of claim 35 wherein said means fordetecting comprises means for monitoring for a K28.5 10-bit word in saidfirst stream of data.
 37. The system of claim 34 further comprising:means for detecting a start-of-frame 40-bit word in said firstoptimizable frame and means for sending said start-of-frame 40-bit wordto an output FIFO; means for detecting an end-of-frame 40-bit word insaid first optimizable frame and means for temporarily retaining saidend-of-frame 40-bit word while waiting for completion of optimizing saidfirst portion of said first optimizable frame; and means for furnishinga polarity-correct version of said end-of-frame 40-bit word to saidoutput FIFO for appending to first optimized data within said outputFIFO, said first optimized data representing a first optimized versionof said first portion of said first optimizable frame after beingoptimized by said optimization processor.
 38. The system of claim 37further comprising: means for receiving second optimized data from saidoptimization processor, said second optimized data representing a secondoptimized version of said first portion of said first optimizable frameafter being optimized by said optimization processor; means fordetecting an end-of-optimized data flag in said second optimized data;and means for furnishing said polarity-correct version of saidend-of-frame 40-bit word to said output FIFO upon detecting saidend-of-optimized data flag in said second optimized data.
 39. Acomputing device comprising: a processor; one or more computer-readablestorage devices; a data optimization engine operably associated with theprocessor and the one or more computer-readable storage devices, thedata optimization engine comprising: a transmit interface circuitcoupled to an optimization processor, said transmit interface circuitbeing configured for receiving a first stream of data, wherein saidtransmit interface circuit includes a traffic controller circuit forseparating frames in said first stream of data into a first optimizableframe and a first nonoptimizable frame, and an optimization front-endcircuit coupled to said traffic controller circuit to receive at least afirst portion of said first optimizable frame, said optimizationfront-end circuit including a protocol conversion circuit configured toconvert data in said first portion of said first optimizable frame froma first protocol to a second protocol suitable for processing by saidoptimization processor, wherein said optimization front-end circuitfurther includes an end-of-optimization-file processing circuit, saidend-of-optimization-file processing circuit configured to flag an end ofsaid first portion of said first optimizable frame to said optimizationprocessor, wherein said optimization processor is configured to optimizesaid first portion of said first optimizable frame by performing atleast one of compression or encryption on said first portion of saidfirst optimizable frame.
 40. The computing device of claim 39 whereinsaid end-of-optimization-file processing circuit is configured to add,after said data in said first portion of said first optimizable frame isconverted from said first protocol to said second protocol, anend-of-optimization-file flag to each word sent from said transmitinterface circuit to said optimization processor.
 41. The computingdevice of claim 40 wherein said end-of-optimization-file flag is one bitlong.
 42. The computing device of claim 39 wherein said first protocolis the 10-bit interface protocol, and said protocol conversion circuitincludes a 10-bit/8-bit lookup table.
 43. The computing device of claim42 further including a frame alignment circuit for detecting andaligning a start of a primitive signal word in said first stream of datawith a start of a reference 40-bit word, thereby framing said primitivesignal word with respect to said reference 40-bit word.
 44. Thecomputing device of claim 43 wherein said frame alignment circuit isconfigured to detect said start of said primitive signal word bymonitoring for a K28.5 10-bit word in said first stream of data.
 45. Thecomputing device of claim 42 further including an output FIFO coupled tosaid traffic controller circuit and said optimization front-end circuit,said traffic controller circuit further includes a start-of-framehandler circuit and an end-of-frame handler circuit, said start-of-framehandler circuit is configured detect a start-of-frame 40-bit word insaid first optimizable frame and to send said start-of-frame 40-bit wordto said output FIFO, effectively bypassing said optimization front-endcircuit, said end-of-frame handler circuit is configured to detect anend-of-frame 40-bit word in said first optimizable frame and totemporarily retain said end-of-frame 40-bit word while waiting for saidoptimization processor to complete optimizing said first portion of saidfirst optimizable frame, said end-of-frame handler circuit is furtherconfigured to furnish a polarity-correct version of said end-of-frame40-bit word to said output FIFO for appending to first optimized datawithin said output FIFO, said first optimized data represents a firstoptimized version of said first portion of said first optimizable frameafter being optimized by said optimization processor.
 46. The computingdevice of claim 45 wherein said transmit interface circuit furtherincludes an end-of-optimized-data flag handler circuit coupled toreceive second optimized data from said optimization processor, saidsecond optimized data represents a second optimized version of saidfirst portion of said first optimizable frame after being optimized bysaid optimization processor, said end-of-optimized data flag handlerbeing configured to detect an end-of-optimized data flag in said secondoptimized data and signals, upon detecting said end-of-optimized dataflag in said second optimized data, said end-of-frame handler circuit tofurnish said polarity-correct version of said end-of-frame 40-bit wordto said output FIFO.
 47. The computing device of claim 39, wherein thedata optimization engine is configured to be deployed in a Fiber Channelsetting.
 48. The computing device of claim 39, wherein the dataoptimization engine is configured to be interposed between a FiberChannel controller and a serializer/deserializer.
 49. The computingdevice of claim 39, wherein the data optimization engine is configuredto work in conjunction with protocols selected from a group of protocolscomprising: Ethernet protocols, Extended Attachment Unit Interface(XAUI) protocols, or I-SCSI protocols.
 50. The computing device of claim39, wherein the data optimization engine is configured to be deployedbetween a host device and a storage device.
 51. The computing device ofclaim 50, wherein the data optimization engine is configured to bedeployed between the host device and multiple different types ofinterfaces.
 52. The computing device of claim 51, wherein said multipledifferent types of interfaces comprise interfaces selected from a groupof interfaces comprising: a Fiber Channel interface, an Ethernetinterface, a SCSI interface and an Infiniband interface.
 53. Thecomputing device of claim 39, wherein the data optimization engine isconfigured to be deployed between a CPU and a memory.
 54. The computingdevice of claim 39, wherein the data optimization engine is configuredto be deployed between networked devices.
 55. The computing device ofclaim 54, wherein said networked devices comprise networked devicesselected from a group of networked devices comprising: a networkinterface card, a router, or a switch.
 56. The computing device of claim39, wherein the data optimization engine is configured to be interposedbetween two PCI devices.
 57. The computing device of claim 56, whereinthe data optimization engine is configured to: process memory writetransactions between the two PCI devices for possible encryption and/orcompression; and process memory read transactions between the two PCIdevices for possible decryption and/or decompression.